1 00:00:00,000 --> 00:00:00,120 2 00:00:00,120 --> 00:00:02,500 The following content is provided under a Creative 3 00:00:02,500 --> 00:00:03,910 Commons license. 4 00:00:03,910 --> 00:00:06,950 Your support will help MIT OpenCourseWare continue to 5 00:00:06,950 --> 00:00:10,600 offer high-quality educational resources for free. 6 00:00:10,600 --> 00:00:13,500 To make a donation, or view additional materials from 7 00:00:13,500 --> 00:00:18,130 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:18,130 --> 00:00:20,110 ocw.mit.edu. 9 00:00:20,110 --> 00:00:28,270 PROFESSOR: Welcome to 6.172, Performance Engineering of 10 00:00:28,270 --> 00:00:31,520 Software Systems. 11 00:00:31,520 --> 00:00:35,050 This is a fun class. 12 00:00:35,050 --> 00:00:38,900 It's all about going fast. 13 00:00:38,900 --> 00:00:40,590 We all like to go fast-- 14 00:00:40,590 --> 00:00:42,930 well, not all of us. 15 00:00:42,930 --> 00:00:46,540 But everybody in this class likes to go fast. 16 00:00:46,540 --> 00:00:48,880 Otherwise you won't like the class very much. 17 00:00:48,880 --> 00:00:50,835 It's all about making stuff go fast. 18 00:00:50,835 --> 00:00:55,670 19 00:00:55,670 --> 00:00:57,770 So a couple of brief announcements. 20 00:00:57,770 --> 00:01:01,230 We're going to be videotaping for OpenCourseWare. 21 00:01:01,230 --> 00:01:05,360 So if you don't want to be videotaped, you should sit it 22 00:01:05,360 --> 00:01:09,500 in a place where the camera won't catch you. 23 00:01:09,500 --> 00:01:11,950 Because we're in such a large classroom, I'd like everybody 24 00:01:11,950 --> 00:01:16,290 to sort of every day, without me having to draw-- 25 00:01:16,290 --> 00:01:18,690 I'm not really an artist. 26 00:01:18,690 --> 00:01:20,470 My sister's an artist, actually. 27 00:01:20,470 --> 00:01:21,820 She's a painter. 28 00:01:21,820 --> 00:01:24,340 But maybe that gene escaped me. 29 00:01:24,340 --> 00:01:27,280 It went that way and not this way. 30 00:01:27,280 --> 00:01:29,390 So I don't want to have to draw it up again. 31 00:01:29,390 --> 00:01:31,810 So everybody, try to confine yourself to this sort of 32 00:01:31,810 --> 00:01:32,985 quarter of the classroom. 33 00:01:32,985 --> 00:01:36,860 And that way, it'll feel more like a size room that 34 00:01:36,860 --> 00:01:38,110 we ought to be in. 35 00:01:38,110 --> 00:01:41,340 36 00:01:41,340 --> 00:01:51,060 Let me go to our course information sheet. 37 00:01:51,060 --> 00:01:52,190 Did everybody get a copy of the 38 00:01:52,190 --> 00:01:55,750 course information handout? 39 00:01:55,750 --> 00:01:57,830 So we're going to the class-- we're going to start at the 40 00:01:57,830 --> 00:02:03,300 beginning with the Administrivia stuff, try to 41 00:02:03,300 --> 00:02:05,610 run through that as quickly as we can. 42 00:02:05,610 --> 00:02:11,009 And then my co-lecturer, professor Saman Amarasinghe-- 43 00:02:11,009 --> 00:02:13,290 if you can't pronounce or spell his last name, you can 44 00:02:13,290 --> 00:02:14,540 just say "Saman"-- 45 00:02:14,540 --> 00:02:16,660 46 00:02:16,660 --> 00:02:20,570 will give today's lecture. 47 00:02:20,570 --> 00:02:23,520 And we have four teaching assistants. 48 00:02:23,520 --> 00:02:26,240 So teaching assistants, please stand up. 49 00:02:26,240 --> 00:02:32,010 They are Josh, John, Reed, and Eric. 50 00:02:32,010 --> 00:02:34,910 And let me tell you, what they have done in the last few days 51 00:02:34,910 --> 00:02:37,340 to get us ready for this course is nothing short of 52 00:02:37,340 --> 00:02:39,970 superhuman. 53 00:02:39,970 --> 00:02:43,880 So you really need to accord them your respect, and they're 54 00:02:43,880 --> 00:02:46,060 going to really work hard to help you folks do 55 00:02:46,060 --> 00:02:48,860 well in this class. 56 00:02:48,860 --> 00:02:50,630 We have, I think, a pretty decent ratio. 57 00:02:50,630 --> 00:02:54,290 I think even with concerns about cutbacks, et cetera, to 58 00:02:54,290 --> 00:02:58,350 have four TAs for this is really good. 59 00:02:58,350 --> 00:03:00,870 And we need it, because it's a software class. 60 00:03:00,870 --> 00:03:08,610 And software is, as you know, bugs can take an arbitrarily 61 00:03:08,610 --> 00:03:09,860 long time to find. 62 00:03:09,860 --> 00:03:14,060 63 00:03:14,060 --> 00:03:15,180 You guys can sit. 64 00:03:15,180 --> 00:03:20,660 Besides our teaching assistants, we have two 65 00:03:20,660 --> 00:03:24,770 support staff people helping us, Mary McDavitt, who works 66 00:03:24,770 --> 00:03:28,700 for Saman, and Marcia Davidson, who works for me. 67 00:03:28,700 --> 00:03:31,100 And they're generally pretty good about being able to track 68 00:03:31,100 --> 00:03:36,490 us down if we're in some far corner of 69 00:03:36,490 --> 00:03:39,760 the world or something. 70 00:03:39,760 --> 00:03:43,680 Our lectures are going to be held on Tuesdays and Thursdays 71 00:03:43,680 --> 00:03:46,950 in this room, 2:30 to 4:00 PM. 72 00:03:46,950 --> 00:03:48,650 You should plan to attend regularly. 73 00:03:48,650 --> 00:03:53,720 We dispense all kinds of wonderful gems that you will 74 00:03:53,720 --> 00:03:57,650 miss if you think you're going to get it just from reading 75 00:03:57,650 --> 00:04:01,660 and even looking at videotapes or whatever. 76 00:04:01,660 --> 00:04:03,300 I'm not sure what our arrangement is going to be 77 00:04:03,300 --> 00:04:05,640 about having videos available. 78 00:04:05,640 --> 00:04:11,580 OK, they're not going to be available, except unless 79 00:04:11,580 --> 00:04:13,730 there's something really extenuating circumstance. 80 00:04:13,730 --> 00:04:20,670 You have to go into a hospital to have your forebrain removed 81 00:04:20,670 --> 00:04:23,150 or something-- 82 00:04:23,150 --> 00:04:24,400 something serious, OK? 83 00:04:24,400 --> 00:04:27,450 84 00:04:27,450 --> 00:04:29,990 All material in the lecture, even if it's delivered oral, 85 00:04:29,990 --> 00:04:32,340 is fair game for projects and quizzes. 86 00:04:32,340 --> 00:04:36,360 We will post the lecture slides, but they don't 87 00:04:36,360 --> 00:04:40,030 generally convey all of the information. 88 00:04:40,030 --> 00:04:40,770 And we're going to be 89 00:04:40,770 --> 00:04:42,200 video-recorded, as I mentioned. 90 00:04:42,200 --> 00:04:45,970 If you don't want to appear, sit behind the camera, and you 91 00:04:45,970 --> 00:04:47,760 won't appear. 92 00:04:47,760 --> 00:04:51,020 We have no recitations in this class. 93 00:04:51,020 --> 00:04:53,490 However, the teaching assistants will periodically 94 00:04:53,490 --> 00:04:59,270 hold primers where we will bring you up to speed on 95 00:04:59,270 --> 00:05:02,620 various technologies that you may not be up to speed in, 96 00:05:02,620 --> 00:05:07,770 such as C. How many people have programmed in C before? 97 00:05:07,770 --> 00:05:10,075 And how many have not programmed in C before? 98 00:05:10,075 --> 00:05:10,820 A few. 99 00:05:10,820 --> 00:05:12,070 OK. 100 00:05:12,070 --> 00:05:13,980 101 00:05:13,980 --> 00:05:18,590 Tools such as VTune. 102 00:05:18,590 --> 00:05:19,880 Who's used VTune before? 103 00:05:19,880 --> 00:05:22,790 104 00:05:22,790 --> 00:05:25,340 Who has not used VTune? 105 00:05:25,340 --> 00:05:26,590 Good. 106 00:05:26,590 --> 00:05:28,560 107 00:05:28,560 --> 00:05:31,780 Pin-- who's used Pin? 108 00:05:31,780 --> 00:05:32,920 OK. 109 00:05:32,920 --> 00:05:34,800 Who has used Cilk++? 110 00:05:34,800 --> 00:05:37,930 111 00:05:37,930 --> 00:05:39,510 OK, a couple people. 112 00:05:39,510 --> 00:05:42,275 So you see, we're going to have some primers to bring you 113 00:05:42,275 --> 00:05:43,620 up to speed in those kinds of things. 114 00:05:43,620 --> 00:05:46,800 We don't expect that you came in knowing any of those 115 00:05:46,800 --> 00:05:48,050 things, particularly. 116 00:05:48,050 --> 00:05:51,020 117 00:05:51,020 --> 00:05:55,290 We're going to be using Stellar for our course 118 00:05:55,290 --> 00:05:56,590 management. 119 00:05:56,590 --> 00:05:58,400 And here's the URL. 120 00:05:58,400 --> 00:06:01,240 And you should make sure that you're registered as a member 121 00:06:01,240 --> 00:06:04,100 of 6.172, because otherwise, you're not going to get any 122 00:06:04,100 --> 00:06:04,700 announcements. 123 00:06:04,700 --> 00:06:05,580 We're going to do all of our 124 00:06:05,580 --> 00:06:07,400 administration through Stellar. 125 00:06:07,400 --> 00:06:10,040 So make sure that you're registered there. 126 00:06:10,040 --> 00:06:13,670 127 00:06:13,670 --> 00:06:17,140 At the end of today, I think it is, we'll send out an 128 00:06:17,140 --> 00:06:21,950 announcement to the student mailing list. 129 00:06:21,950 --> 00:06:25,330 If you don't get something by the end today-- 130 00:06:25,330 --> 00:06:27,660 so you wake up tomorrow morning, you have no email 131 00:06:27,660 --> 00:06:29,400 from the course staff. 132 00:06:29,400 --> 00:06:32,592 What that means is you should email us. 133 00:06:32,592 --> 00:06:34,632 AUDIENCE: Well, that mailing list would be the 134 00:06:34,632 --> 00:06:35,500 preregistered ones. 135 00:06:35,500 --> 00:06:37,520 PROFESSOR: That'd be the preregistered ones. 136 00:06:37,520 --> 00:06:39,420 And then that way, we'll get all the people. 137 00:06:39,420 --> 00:06:41,710 So if you know you're not preregistered, you can send us 138 00:06:41,710 --> 00:06:42,960 email before then. 139 00:06:42,960 --> 00:06:45,650 140 00:06:45,650 --> 00:06:49,190 There are two one-hour quizzes given during class time. 141 00:06:49,190 --> 00:06:50,450 AUDIENCE: One and a half. 142 00:06:50,450 --> 00:06:51,290 PROFESSOR: What's that? 143 00:06:51,290 --> 00:06:52,520 AUDIENCE: One and a half hour. 144 00:06:52,520 --> 00:06:53,980 PROFESSOR: Oh, you're right. 145 00:06:53,980 --> 00:06:56,530 It's one and a half hour quizzes. 146 00:06:56,530 --> 00:06:58,440 Yes, two one and a half hour quizzes. 147 00:06:58,440 --> 00:07:00,150 We'll correct that in the online version. 148 00:07:00,150 --> 00:07:03,420 149 00:07:03,420 --> 00:07:06,190 Given during class time. 150 00:07:06,190 --> 00:07:08,300 Can somebody, by the way, correct this as we go along? 151 00:07:08,300 --> 00:07:10,780 Great. 152 00:07:10,780 --> 00:07:13,120 The dates are on the course calendar, which is available 153 00:07:13,120 --> 00:07:14,460 on Stellar. 154 00:07:14,460 --> 00:07:19,850 We've had mixed results with the ICS feed from Stellar, but 155 00:07:19,850 --> 00:07:24,075 you're supposed to be able to download the calendar to your 156 00:07:24,075 --> 00:07:29,040 iPhone or other ICS-capable devices. 157 00:07:29,040 --> 00:07:33,590 But we were successful in downloading, but all the times 158 00:07:33,590 --> 00:07:35,620 were screwed up. 159 00:07:35,620 --> 00:07:38,750 But anyway, hopefully, maybe they'll allow that. 160 00:07:38,750 --> 00:07:40,870 But anyway, the calendar is there. 161 00:07:40,870 --> 00:07:44,940 The quizzes will generally be closed book and closed notes, 162 00:07:44,940 --> 00:07:48,970 but you will be permitted crib sheets, one sheet that you 163 00:07:48,970 --> 00:07:52,340 summarize anything that you feel that you need to know. 164 00:07:52,340 --> 00:07:54,930 165 00:07:54,930 --> 00:07:58,760 Generally, everything that we cover in here is fair game for 166 00:07:58,760 --> 00:08:00,770 the quizzes, including things that are in 167 00:08:00,770 --> 00:08:02,930 prerequisite courses. 168 00:08:02,930 --> 00:08:07,100 There is no final exam because we have a final project. 169 00:08:07,100 --> 00:08:12,490 This is a fast-moving class, so if you get caught behind, 170 00:08:12,490 --> 00:08:16,680 it's not a good sign, because it's really hard to work 171 00:08:16,680 --> 00:08:18,820 double-hard to keep up in this class. 172 00:08:18,820 --> 00:08:21,120 You really have to keep on top of things. 173 00:08:21,120 --> 00:08:23,840 And so for that reason and others-- 174 00:08:23,840 --> 00:08:26,440 because it's really difficult for us-- 175 00:08:26,440 --> 00:08:28,370 it's fast moving for us as well-- 176 00:08:28,370 --> 00:08:30,390 we generally cannot accept late projects. 177 00:08:30,390 --> 00:08:33,030 If you haven't been able to complete a project, you should 178 00:08:33,030 --> 00:08:37,100 just package up your partial solution and hand that in, and 179 00:08:37,100 --> 00:08:39,970 you'll get partial credit on it. 180 00:08:39,970 --> 00:08:44,130 Do not just say, oh, I'm not going to hand something in, 181 00:08:44,130 --> 00:08:47,060 and expect it to get it done later. 182 00:08:47,060 --> 00:08:50,360 If you do find yourself in an unusual situation which you 183 00:08:50,360 --> 00:08:52,960 think there are some extenuating circumstances, 184 00:08:52,960 --> 00:08:56,820 like the aforementioned brain surgery or something, let us 185 00:08:56,820 --> 00:08:58,040 know in advance. 186 00:08:58,040 --> 00:09:00,150 So if you're going to get in a car accident, it's really 187 00:09:00,150 --> 00:09:04,810 helpful if you tell us the day before so that we can plan. 188 00:09:04,810 --> 00:09:07,070 Obviously can't always do that, but if it is something 189 00:09:07,070 --> 00:09:09,060 for which there is any reasonable expectation it 190 00:09:09,060 --> 00:09:13,980 should be done beforehand, you should let us know. 191 00:09:13,980 --> 00:09:17,870 And we may require, for those kinds of exceptions, some kind 192 00:09:17,870 --> 00:09:22,470 of confirmation from a dean or medical professional. 193 00:09:22,470 --> 00:09:26,730 So if that's the route that you're in, you can plan a 194 00:09:26,730 --> 00:09:30,190 little bit and make sure that those confirmations 195 00:09:30,190 --> 00:09:32,510 are on their way. 196 00:09:32,510 --> 00:09:36,530 The grading, this is essentially approximately how 197 00:09:36,530 --> 00:09:39,090 we're going to grade. 198 00:09:39,090 --> 00:09:42,900 As you can see, there's a bunch of content in the 199 00:09:42,900 --> 00:09:46,360 projects and a bunch in the quizzes. 200 00:09:46,360 --> 00:09:49,970 And then there's also participation grade. 201 00:09:49,970 --> 00:09:53,970 And the participation is both in-- 202 00:09:53,970 --> 00:09:57,090 we will have design reviews and such, so that's part of 203 00:09:57,090 --> 00:09:59,030 the participation, as well as asking 204 00:09:59,030 --> 00:10:01,150 questions and such in class. 205 00:10:01,150 --> 00:10:03,320 And because we're video-recording everything, we 206 00:10:03,320 --> 00:10:05,990 know exactly who's asking questions, unless you choose 207 00:10:05,990 --> 00:10:09,410 to sit outside the video zone. 208 00:10:09,410 --> 00:10:13,760 209 00:10:13,760 --> 00:10:21,470 In addition, if there is a significant missing element to 210 00:10:21,470 --> 00:10:23,920 your work, it doesn't matter how well you did on the other 211 00:10:23,920 --> 00:10:26,070 things, you will receive a failing grade. 212 00:10:26,070 --> 00:10:30,390 You must, essentially, do all the work. 213 00:10:30,390 --> 00:10:35,490 So if you do not get substantial credit on the 214 00:10:35,490 --> 00:10:39,450 final project or any other two assignments, for example, you 215 00:10:39,450 --> 00:10:42,110 can expect to receive a failing grade. 216 00:10:42,110 --> 00:10:44,660 It goes downhill very quickly. 217 00:10:44,660 --> 00:10:46,330 If you're missing one assignment, generally don't 218 00:10:46,330 --> 00:10:50,320 expect to get higher than a C, for example. 219 00:10:50,320 --> 00:10:54,230 It goes downhill very, very quickly, because that's what 220 00:10:54,230 --> 00:10:55,480 this is all about is hands on. 221 00:10:55,480 --> 00:11:00,120 222 00:11:00,120 --> 00:11:03,430 What's permissible in this class? 223 00:11:03,430 --> 00:11:06,340 So it's your responsibility to satisfy both the letter and 224 00:11:06,340 --> 00:11:08,200 spirit of the rules. 225 00:11:08,200 --> 00:11:10,380 So if you have any questions, let us know. 226 00:11:10,380 --> 00:11:12,210 Here's what the rules are. 227 00:11:12,210 --> 00:11:15,680 Also, let me just say how Saman and I treat this. 228 00:11:15,680 --> 00:11:19,730 We have been involved in cheating incidents off and on 229 00:11:19,730 --> 00:11:21,020 over the years. 230 00:11:21,020 --> 00:11:23,180 I've been at MIT almost 30 years. 231 00:11:23,180 --> 00:11:24,823 Saman, you've been here-- 232 00:11:24,823 --> 00:11:25,276 PROFESSOR: 13. 233 00:11:25,276 --> 00:11:27,230 PROFESSOR: 13 years. 234 00:11:27,230 --> 00:11:29,720 So we've been here long time, longer than you folks. 235 00:11:29,720 --> 00:11:32,360 We've been through this a bunch of times. 236 00:11:32,360 --> 00:11:34,510 It's really a pain when somebody cheats. 237 00:11:34,510 --> 00:11:39,390 It robs me of a huge amount of my time that I could otherwise 238 00:11:39,390 --> 00:11:44,000 be spending on teaching and students and my own 239 00:11:44,000 --> 00:11:46,000 professional development. 240 00:11:46,000 --> 00:11:51,250 However, we both take cheating incidents extremely seriously. 241 00:11:51,250 --> 00:11:55,950 So we have this thing that is sometimes really inconvenient 242 00:11:55,950 --> 00:11:56,970 called integrity. 243 00:11:56,970 --> 00:11:59,400 And I'd rather, if I find somebody's cheating, just give 244 00:11:59,400 --> 00:12:01,970 you an F and not deal with it. 245 00:12:01,970 --> 00:12:03,910 Drat. 246 00:12:03,910 --> 00:12:07,510 Instead, when somebody's cheating, I tend to want to 247 00:12:07,510 --> 00:12:10,750 take it to the Committee on Discipline, where you face 248 00:12:10,750 --> 00:12:12,810 things like expulsion, et cetera. 249 00:12:12,810 --> 00:12:14,490 So I don't like to do that. 250 00:12:14,490 --> 00:12:15,830 It's very time consuming for me. 251 00:12:15,830 --> 00:12:21,450 But I find it hard to just look the other way, because I 252 00:12:21,450 --> 00:12:25,900 view the issue of integrity as protecting the vast majority 253 00:12:25,900 --> 00:12:30,790 of students who are obeying the rules. 254 00:12:30,790 --> 00:12:36,150 So when a cheating incident occurs, I sort of feel honor 255 00:12:36,150 --> 00:12:39,500 bound to push it to its proper conclusion. 256 00:12:39,500 --> 00:12:42,020 257 00:12:42,020 --> 00:12:45,660 So please don't cheat. 258 00:12:45,660 --> 00:12:49,250 We will use a variety of means at our disposal, including 259 00:12:49,250 --> 00:12:52,110 technological ones, to detect cheating. 260 00:12:52,110 --> 00:12:55,140 Also, if you think that sharing your results with 261 00:12:55,140 --> 00:12:57,640 somebody else means that they're 262 00:12:57,640 --> 00:12:59,310 cheating and you're not-- 263 00:12:59,310 --> 00:13:01,650 in some cultures that's the way it's treated-- 264 00:13:01,650 --> 00:13:02,900 no. 265 00:13:02,900 --> 00:13:04,690 266 00:13:04,690 --> 00:13:08,200 In this culture, in particular the culture in this classroom, 267 00:13:08,200 --> 00:13:12,010 they're both equally guilty of academic dishonesty, whether 268 00:13:12,010 --> 00:13:16,880 you are the giver or the taker. 269 00:13:16,880 --> 00:13:21,380 Equally, we treat that as equal offense. 270 00:13:21,380 --> 00:13:22,770 So here's the particular things. 271 00:13:22,770 --> 00:13:27,760 You may not share, generally, your solutions with people who 272 00:13:27,760 --> 00:13:29,630 are not in your group. 273 00:13:29,630 --> 00:13:31,310 So we will have a bunch of group projects, 274 00:13:31,310 --> 00:13:32,560 generally in pairs. 275 00:13:32,560 --> 00:13:37,020 276 00:13:37,020 --> 00:13:42,030 And that includes both people in the class and people 277 00:13:42,030 --> 00:13:44,590 outside the class. 278 00:13:44,590 --> 00:13:47,840 So generally, you keep things to yourself. 279 00:13:47,840 --> 00:13:49,960 When you're in a group, of course, you can share ideas 280 00:13:49,960 --> 00:13:51,930 within the group. 281 00:13:51,930 --> 00:13:56,440 But we expect that everybody's making a fair contribution. 282 00:13:56,440 --> 00:13:59,740 So you should be concerned if you are not holding up your 283 00:13:59,740 --> 00:14:01,630 end of your group participation. 284 00:14:01,630 --> 00:14:04,300 285 00:14:04,300 --> 00:14:07,150 And we're going to ask you to describe the contributions of 286 00:14:07,150 --> 00:14:08,150 the people in your group. 287 00:14:08,150 --> 00:14:10,600 Generally, the group, as I say, is two. 288 00:14:10,600 --> 00:14:13,600 So it's basically saying, what did you do? 289 00:14:13,600 --> 00:14:14,970 How did you divide up your project? 290 00:14:14,970 --> 00:14:15,470 What did you do? 291 00:14:15,470 --> 00:14:18,450 What did the other person do? 292 00:14:18,450 --> 00:14:21,860 You generally can't share it with anybody else. 293 00:14:21,860 --> 00:14:26,150 You can't copy or transcribe solutions from other places. 294 00:14:26,150 --> 00:14:27,880 The work you submit must be your own. 295 00:14:27,880 --> 00:14:29,510 Pretty straightforward. 296 00:14:29,510 --> 00:14:33,380 You may go out and use general conceptual material, such that 297 00:14:33,380 --> 00:14:38,190 you might get from a good textbook or from Wikipedia or 298 00:14:38,190 --> 00:14:39,120 someplace like that. 299 00:14:39,120 --> 00:14:43,000 That's perfectly fine, resources you would ordinarily 300 00:14:43,000 --> 00:14:44,610 have available. 301 00:14:44,610 --> 00:14:49,220 But if you do use any material from an external source, you 302 00:14:49,220 --> 00:14:56,590 should properly cite it in normal academic that fashion. 303 00:14:56,590 --> 00:14:59,360 Cite where it is so that somebody else can go and find 304 00:14:59,360 --> 00:15:02,830 and look to see what it is that you're citing. 305 00:15:02,830 --> 00:15:07,670 If you feel that you have transgressed in 306 00:15:07,670 --> 00:15:10,160 some way, let us know. 307 00:15:10,160 --> 00:15:12,840 It always goes way easier. 308 00:15:12,840 --> 00:15:16,380 I generally tend to be kind of a nice guy when somebody comes 309 00:15:16,380 --> 00:15:22,260 to me saying, oops, verses I hear from somewhere else or 310 00:15:22,260 --> 00:15:26,340 one of our tools detects something else. 311 00:15:26,340 --> 00:15:28,910 Then I'm kind of cranky. 312 00:15:28,910 --> 00:15:31,950 So I'd rather, if you come to me, we could work something 313 00:15:31,950 --> 00:15:35,070 out, generally. 314 00:15:35,070 --> 00:15:39,600 Most of your out-of-class time will spent 315 00:15:39,600 --> 00:15:41,820 completing six projects. 316 00:15:41,820 --> 00:15:44,110 And there's an outline of what the projects are. 317 00:15:44,110 --> 00:15:47,710 They're all fun projects. 318 00:15:47,710 --> 00:15:48,870 You can ask the TAs. 319 00:15:48,870 --> 00:15:51,930 The TAs will tell you how much fun they are, because they've 320 00:15:51,930 --> 00:15:56,660 generally helped to design and built their own 321 00:15:56,660 --> 00:15:58,550 solutions to them. 322 00:15:58,550 --> 00:16:00,420 Some will be done in pairs, some individually. 323 00:16:00,420 --> 00:16:04,340 The final project can either be done in pairs or in threes, 324 00:16:04,340 --> 00:16:05,910 I think we said. 325 00:16:05,910 --> 00:16:06,950 AUDIENCE: [INAUDIBLE] 326 00:16:06,950 --> 00:16:09,490 PROFESSOR: Yes, we also have-- we didn't put that in here. 327 00:16:09,490 --> 00:16:16,960 We also have a Project 0, which will be available today 328 00:16:16,960 --> 00:16:17,970 after class. 329 00:16:17,970 --> 00:16:21,170 So you should go to Stellar and get Project 0. 330 00:16:21,170 --> 00:16:24,690 Project 0 is not to be handed in. 331 00:16:24,690 --> 00:16:28,160 It's getting warmed up with all of the machines and tools 332 00:16:28,160 --> 00:16:29,800 that we'll be using in the class. 333 00:16:29,800 --> 00:16:33,730 We have been very fortunate to have very nice contributions 334 00:16:33,730 --> 00:16:46,028 from Dell and Intel of a new Cluster facility that has 16 335 00:16:46,028 --> 00:16:47,278 12-core machines. 336 00:16:47,278 --> 00:16:50,740 337 00:16:50,740 --> 00:16:56,370 And they're Intel Core i7 machines. 338 00:16:56,370 --> 00:17:01,620 Each chip has six cores, and there's two cores per machine, 339 00:17:01,620 --> 00:17:06,790 each with 48 gigabytes of memory. 340 00:17:06,790 --> 00:17:11,400 So these are kind of honking modern facility. 341 00:17:11,400 --> 00:17:13,270 So as I say, it's fun. 342 00:17:13,270 --> 00:17:16,589 It's fun when you do something on your laptop, and then you 343 00:17:16,589 --> 00:17:20,160 go and you do it on one of these machines and 344 00:17:20,160 --> 00:17:21,960 it goes like this. 345 00:17:21,960 --> 00:17:23,840 It's really fun when these things go fast. 346 00:17:23,840 --> 00:17:26,960 347 00:17:26,960 --> 00:17:31,450 So Project 0, you should start on today. 348 00:17:31,450 --> 00:17:37,130 And it's basically the handout called Getting Started. 349 00:17:37,130 --> 00:17:39,890 It will be available on Stellar. 350 00:17:39,890 --> 00:17:43,210 And it will take you through getting access to these 351 00:17:43,210 --> 00:17:44,600 machines and so forth. 352 00:17:44,600 --> 00:17:48,220 In particular, in order to use these machines, you must have 353 00:17:48,220 --> 00:17:51,040 a CSAIL account. 354 00:17:51,040 --> 00:17:54,480 Who does not have a CSAIL account? 355 00:17:54,480 --> 00:17:57,950 So that is going to be the first thing you need to do, 356 00:17:57,950 --> 00:17:59,040 because you've got to get that. 357 00:17:59,040 --> 00:18:00,960 So what you want to do is get 358 00:18:00,960 --> 00:18:04,110 registered for a CSAIL account-- 359 00:18:04,110 --> 00:18:05,250 and we give instructions in there. 360 00:18:05,250 --> 00:18:08,200 It's basically going to a web page-- 361 00:18:08,200 --> 00:18:11,280 pretty much as soon as you get out of the classroom. 362 00:18:11,280 --> 00:18:14,000 Because you won't be able to start anything until you have 363 00:18:14,000 --> 00:18:15,910 this CSAIL account. 364 00:18:15,910 --> 00:18:18,530 And hopefully, we'll be able to give you access within a 365 00:18:18,530 --> 00:18:19,780 few hours after you register. 366 00:18:19,780 --> 00:18:24,700 367 00:18:24,700 --> 00:18:25,310 Yeah, question? 368 00:18:25,310 --> 00:18:27,938 AUDIENCE: What operating systems are the tools-- 369 00:18:27,938 --> 00:18:28,810 PROFESSOR: Linux. 370 00:18:28,810 --> 00:18:31,380 We'll be using Linux. 371 00:18:31,380 --> 00:18:35,140 So there are a variety of ways that we describe for how you 372 00:18:35,140 --> 00:18:37,860 use these machines remotely. 373 00:18:37,860 --> 00:18:40,400 We have the Andrew File System-- 374 00:18:40,400 --> 00:18:41,400 AFS-- 375 00:18:41,400 --> 00:18:44,690 which you can use so that you can edit things locally but 376 00:18:44,690 --> 00:18:50,010 have an SSH terminal window on the machine, which makes it 377 00:18:50,010 --> 00:18:51,160 very convenient. 378 00:18:51,160 --> 00:18:54,420 On the other hand, for some people, often some dorm rooms 379 00:18:54,420 --> 00:18:58,030 with poor connectivity, doing things by AFS may be 380 00:18:58,030 --> 00:18:59,460 too slow for you. 381 00:18:59,460 --> 00:19:01,535 But there are a variety of other ways that you can-- you 382 00:19:01,535 --> 00:19:03,680 can actually edit right on the machine. 383 00:19:03,680 --> 00:19:07,470 And there are a variety of ways of having a repository on 384 00:19:07,470 --> 00:19:10,510 your machine, because we'll be using Git as 385 00:19:10,510 --> 00:19:15,730 our versioning mechanism. 386 00:19:15,730 --> 00:19:18,160 So you'd be able to have it there and be able to make 387 00:19:18,160 --> 00:19:19,520 copies back on the other machine. 388 00:19:19,520 --> 00:19:21,330 So there are a variety of ways to work. 389 00:19:21,330 --> 00:19:24,530 And you're free to make the software run on your own 390 00:19:24,530 --> 00:19:26,920 laptops or workstations. 391 00:19:26,920 --> 00:19:30,290 The one thing I'll say is we will not support that. 392 00:19:30,290 --> 00:19:39,430 We don't have the resources to support beyond using the 393 00:19:39,430 --> 00:19:42,310 machines that we actually provide. 394 00:19:42,310 --> 00:19:45,970 I think I've got to go faster here. 395 00:19:45,970 --> 00:19:51,000 So let's see. 396 00:19:51,000 --> 00:19:52,250 So software engineering. 397 00:19:52,250 --> 00:19:54,840 398 00:19:54,840 --> 00:19:58,380 So research in programmer productivity has demonstrated 399 00:19:58,380 --> 00:20:02,080 that pair programming, where two programmers sit together, 400 00:20:02,080 --> 00:20:07,460 one at the keyboard and one looking on, is actually faster 401 00:20:07,460 --> 00:20:11,580 than two programmers coding separately. 402 00:20:11,580 --> 00:20:14,010 Why do you suppose that is? 403 00:20:14,010 --> 00:20:15,021 Yeah? 404 00:20:15,021 --> 00:20:16,985 AUDIENCE: You don't get stuck. 405 00:20:16,985 --> 00:20:19,440 The other person can tell you-- it's like when you're 406 00:20:19,440 --> 00:20:21,895 [INAUDIBLE] writing on the blackboard, you get nervous. 407 00:20:21,895 --> 00:20:22,877 You don't know what you're doing. 408 00:20:22,877 --> 00:20:23,859 [INAUDIBLE] 409 00:20:23,859 --> 00:20:25,830 PROFESSOR: Yeah, you could be stuck. 410 00:20:25,830 --> 00:20:26,160 What else? 411 00:20:26,160 --> 00:20:28,108 AUDIENCE: You don't procrastinate. 412 00:20:28,108 --> 00:20:30,543 PROFESSOR: Yeah, you don't procrastinate. 413 00:20:30,543 --> 00:20:31,620 That's a good one. 414 00:20:31,620 --> 00:20:32,260 Don't procrastinate. 415 00:20:32,260 --> 00:20:33,460 AUDIENCE: Finding bugs. 416 00:20:33,460 --> 00:20:34,730 PROFESSOR: Yeah, finding bugs. 417 00:20:34,730 --> 00:20:39,750 So the main one is finding bugs, that the person who's 418 00:20:39,750 --> 00:20:41,430 looking on finds bugs. 419 00:20:41,430 --> 00:20:43,820 In fact, I was just pair programming with Reed the 420 00:20:43,820 --> 00:20:46,070 other day, and we were trying to figure out, why is this 421 00:20:46,070 --> 00:20:48,340 thing not working? 422 00:20:48,340 --> 00:20:50,750 And I say, gee, it looks fine. 423 00:20:50,750 --> 00:20:54,670 It says, next y equals x. 424 00:20:54,670 --> 00:20:58,590 And he says, no, next y is supposed to equal y. 425 00:20:58,590 --> 00:21:01,810 And it's like, we had both read over that code, but as 426 00:21:01,810 --> 00:21:04,080 soon as I spelled it out, I didn't even notice. 427 00:21:04,080 --> 00:21:06,080 But of course, him in hearing it says, oh, it's supposed to 428 00:21:06,080 --> 00:21:08,880 be x and y and y, not x and y. 429 00:21:08,880 --> 00:21:12,600 So anyway, that kind of thing, you pick up very quickly. 430 00:21:12,600 --> 00:21:15,060 It makes it very easy to debug. 431 00:21:15,060 --> 00:21:19,070 Likewise, regression testing demonstrably promotes fast 432 00:21:19,070 --> 00:21:20,780 co-development. 433 00:21:20,780 --> 00:21:24,670 So this is where you build a battery of tests that includes 434 00:21:24,670 --> 00:21:28,040 tests for every mistake that you've ever made. 435 00:21:28,040 --> 00:21:31,030 And it can also be helpful to write unit tests, where you 436 00:21:31,030 --> 00:21:32,410 test individual ones. 437 00:21:32,410 --> 00:21:36,190 And you should also use tools like the assert package. 438 00:21:36,190 --> 00:21:40,060 And some of these things will be taken through in Project 0. 439 00:21:40,060 --> 00:21:44,120 440 00:21:44,120 --> 00:21:45,910 The MIT POSSE. 441 00:21:45,910 --> 00:21:49,270 We have recruited senior software engineers in the 442 00:21:49,270 --> 00:21:52,420 region to share with you their invaluable knowledge and 443 00:21:52,420 --> 00:21:54,020 experience. 444 00:21:54,020 --> 00:21:57,480 We call them Masters In The Practice Of Software Systems 445 00:21:57,480 --> 00:22:01,670 Engineering, which has the great acronym MIT POSSE. 446 00:22:01,670 --> 00:22:04,580 These are people who are not being paid. 447 00:22:04,580 --> 00:22:06,520 And they're senior. 448 00:22:06,520 --> 00:22:09,130 They're doing this because they're volunteering, because 449 00:22:09,130 --> 00:22:11,420 they want to help. 450 00:22:11,420 --> 00:22:14,540 You have a lot to learn from these people, and we will have 451 00:22:14,540 --> 00:22:17,060 code and design reviews with these people with your 452 00:22:17,060 --> 00:22:18,240 assigned master. 453 00:22:18,240 --> 00:22:20,840 You want to treat them nicely. 454 00:22:20,840 --> 00:22:29,990 They're not doing this so that some snotty-nosed kid can say, 455 00:22:29,990 --> 00:22:31,940 I don't care about you. 456 00:22:31,940 --> 00:22:32,770 That's not what they're doing. 457 00:22:32,770 --> 00:22:35,200 They're there to help you get a good grade. 458 00:22:35,200 --> 00:22:37,995 And these are some really talented people, people who 459 00:22:37,995 --> 00:22:46,500 are very famous in their own areas of expertise. 460 00:22:46,500 --> 00:22:49,820 So be gracious, is the main thing I would say. 461 00:22:49,820 --> 00:22:51,070 Be gracious. 462 00:22:51,070 --> 00:22:54,310 463 00:22:54,310 --> 00:22:56,970 Let's see. 464 00:22:56,970 --> 00:22:58,430 The assignments. 465 00:22:58,430 --> 00:22:59,920 You can read about the assignments. 466 00:22:59,920 --> 00:23:01,810 Basically, generally, what we're going to do is have a 467 00:23:01,810 --> 00:23:06,390 beta submission in which we're going to test everything. 468 00:23:06,390 --> 00:23:11,390 And we're also going to take all your tests. 469 00:23:11,390 --> 00:23:14,160 We're going to throw all the tests into a big pool and run 470 00:23:14,160 --> 00:23:17,160 everybody's tests against everybody software. 471 00:23:17,160 --> 00:23:20,730 You will lose points if you fail some of the tests, 472 00:23:20,730 --> 00:23:22,430 assuming the tests are correct tests. 473 00:23:22,430 --> 00:23:24,600 You will lose points if you have an incorrect test. 474 00:23:24,600 --> 00:23:27,470 You will lose points if somebody passes your test and 475 00:23:27,470 --> 00:23:29,490 they have a bug. 476 00:23:29,490 --> 00:23:32,310 You will get points if your test picks up something that 477 00:23:32,310 --> 00:23:36,550 very few other groups tested for. 478 00:23:36,550 --> 00:23:39,580 So you have to both submit code and submit tests, which 479 00:23:39,580 --> 00:23:42,560 is OK, because you've got to test your own stuff before you 480 00:23:42,560 --> 00:23:44,960 hand it in. 481 00:23:44,960 --> 00:23:48,370 But we're also going to be using it to test each other. 482 00:23:48,370 --> 00:23:53,935 After beta, we will have a design and code review with 483 00:23:53,935 --> 00:23:55,950 your master. 484 00:23:55,950 --> 00:23:58,470 And let me just say, the master's there not affecting 485 00:23:58,470 --> 00:24:00,600 your grade in any way. 486 00:24:00,600 --> 00:24:03,060 They do give us reports about what happened in their 487 00:24:03,060 --> 00:24:05,520 meeting, but they have no bearing on the grade. 488 00:24:05,520 --> 00:24:06,790 They're just there to help. 489 00:24:06,790 --> 00:24:09,480 490 00:24:09,480 --> 00:24:14,190 So you don't have to view them as-- 491 00:24:14,190 --> 00:24:16,080 they're not part of the course staff. 492 00:24:16,080 --> 00:24:17,770 If you want to get angry at somebody about how things are 493 00:24:17,770 --> 00:24:20,320 done, get angry at Saman and me. 494 00:24:20,320 --> 00:24:23,510 Don't get angry at them. 495 00:24:23,510 --> 00:24:27,790 The final submission is generally due a couple weeks 496 00:24:27,790 --> 00:24:28,830 after the beta. 497 00:24:28,830 --> 00:24:31,340 And it's after you've gotten feedback on your design, 498 00:24:31,340 --> 00:24:34,090 feedback on things. 499 00:24:34,090 --> 00:24:36,040 We'll give you the whole battery of everybody's 500 00:24:36,040 --> 00:24:39,110 regression tests, so you can test out your code against 501 00:24:39,110 --> 00:24:42,950 everybody's regression test and use that. 502 00:24:42,950 --> 00:24:45,980 And then generally, the scoring is such that people 503 00:24:45,980 --> 00:24:49,590 who get fast code get high marks. 504 00:24:49,590 --> 00:24:51,840 People who get slow codes get lower marks. 505 00:24:51,840 --> 00:24:55,040 506 00:24:55,040 --> 00:24:56,490 If you need help, [UNINTELLIGIBLE]. 507 00:24:56,490 --> 00:24:59,320 Speed is fun. 508 00:24:59,320 --> 00:25:00,570 Enjoy the class. 509 00:25:00,570 --> 00:25:03,600 510 00:25:03,600 --> 00:25:05,192 Yeah, question? 511 00:25:05,192 --> 00:25:08,690 AUDIENCE: In terms of the bug testing, I'm 512 00:25:08,690 --> 00:25:10,121 thinking of bugs like-- 513 00:25:10,121 --> 00:25:11,075 I don't know-- 514 00:25:11,075 --> 00:25:12,029 [INAUDIBLE] 515 00:25:12,029 --> 00:25:14,891 or something like that. 516 00:25:14,891 --> 00:25:17,700 If you happen to not have made a bug in your own program 517 00:25:17,700 --> 00:25:20,206 [INAUDIBLE] 518 00:25:20,206 --> 00:25:23,200 would that count against you in the final testing. 519 00:25:23,200 --> 00:25:28,566 PROFESSOR: You have to view your beta test as testing 520 00:25:28,566 --> 00:25:30,510 other peoples' code also. 521 00:25:30,510 --> 00:25:34,330 PROFESSOR: But if you [UNINTELLIGIBLE] a test that 522 00:25:34,330 --> 00:25:39,000 finds bugs in somebody's program, and if very few 523 00:25:39,000 --> 00:25:42,150 people wrote that test, then you will get points. 524 00:25:42,150 --> 00:25:46,110 So you don't have to test for the world, everything. 525 00:25:46,110 --> 00:25:48,940 If you fail somebody's test, but not yours, you will be 526 00:25:48,940 --> 00:25:50,160 deducted points. 527 00:25:50,160 --> 00:25:53,490 If your test finds other people's bugs, you will gain 528 00:25:53,490 --> 00:25:54,790 points for that. 529 00:25:54,790 --> 00:25:58,120 But on the other hand, if you don't have a test for-- 530 00:25:58,120 --> 00:26:01,680 what you said, I think you don't have to go and test for 531 00:26:01,680 --> 00:26:04,750 every possible condition in there. 532 00:26:04,750 --> 00:26:06,990 The negative side is, if you miss something, 533 00:26:06,990 --> 00:26:07,940 we will deduct points. 534 00:26:07,940 --> 00:26:11,390 Positive side is, if your test finds somebody else's bugs, 535 00:26:11,390 --> 00:26:12,480 you'll get points for it. 536 00:26:12,480 --> 00:26:15,150 We'll go into detail exactly how [UNINTELLIGIBLE]. 537 00:26:15,150 --> 00:26:18,430 Another thing we did last year, which was kind of 538 00:26:18,430 --> 00:26:18,870 interesting-- 539 00:26:18,870 --> 00:26:21,300 I think there were mixed-- 540 00:26:21,300 --> 00:26:22,650 some people really liked it, some people 541 00:26:22,650 --> 00:26:23,630 were not that happy-- 542 00:26:23,630 --> 00:26:26,890 is a little bit of competitive grading. 543 00:26:26,890 --> 00:26:29,420 So here, what we are doing is not the full grade-- 544 00:26:29,420 --> 00:26:30,340 small part of the grade. 545 00:26:30,340 --> 00:26:37,510 What you do is when you first submit your beta, rerun 546 00:26:37,510 --> 00:26:40,110 performance tests of everybody. 547 00:26:40,110 --> 00:26:42,010 And of course, one of you are going to get the 548 00:26:42,010 --> 00:26:44,130 fastest code for that. 549 00:26:44,130 --> 00:26:45,850 At that point, we are going to say, OK, this 550 00:26:45,850 --> 00:26:47,880 is the fastest code. 551 00:26:47,880 --> 00:26:49,090 And how [UNINTELLIGIBLE] 552 00:26:49,090 --> 00:26:51,870 everybody else with that code? 553 00:26:51,870 --> 00:26:57,030 And the person who gets fastest code for that part of 554 00:26:57,030 --> 00:26:58,670 the grade gets the full points. 555 00:26:58,670 --> 00:27:02,150 And everybody else will get a partial part of the points. 556 00:27:02,150 --> 00:27:06,170 However, you get a chance in your final submission to 557 00:27:06,170 --> 00:27:08,880 basically improve your code. 558 00:27:08,880 --> 00:27:11,610 If you do better than the last time's best code, you won't 559 00:27:11,610 --> 00:27:13,250 get any more additional points, because then we don't 560 00:27:13,250 --> 00:27:14,370 want to create [? an arms race. ?] 561 00:27:14,370 --> 00:27:17,020 So you at least had to match the previous 562 00:27:17,020 --> 00:27:19,970 times to get full points. 563 00:27:19,970 --> 00:27:23,870 So you get a chance to see how you stand against your peers 564 00:27:23,870 --> 00:27:26,560 and get a chance to actually [UNINTELLIGIBLE] math with 565 00:27:26,560 --> 00:27:27,030 your peers. 566 00:27:27,030 --> 00:27:32,220 So first time we did it, we said, OK, it's the fraction of 567 00:27:32,220 --> 00:27:37,200 your performance against the performance of the best code. 568 00:27:37,200 --> 00:27:40,680 The problem was, there were a couple of codes that 1000x 569 00:27:40,680 --> 00:27:42,690 performance improvement, and everybody got 10x. 570 00:27:42,690 --> 00:27:45,170 So you've got, like, 0.25 grade. 571 00:27:45,170 --> 00:27:46,060 That didn't help. 572 00:27:46,060 --> 00:27:49,140 So now we actually are doing a logarithmic log scale. 573 00:27:49,140 --> 00:27:53,300 So we'll get the fraction, and we'll take the log off that. 574 00:27:53,300 --> 00:27:56,080 So there will be a little bit more equality. 575 00:27:56,080 --> 00:27:59,570 So you will know how worse off you are, and you get a chance 576 00:27:59,570 --> 00:28:00,160 to [UNINTELLIGIBLE]. 577 00:28:00,160 --> 00:28:02,570 Because it is very important, because a lot of times, there 578 00:28:02,570 --> 00:28:05,480 might be a crucial thing that you missed. 579 00:28:05,480 --> 00:28:08,780 And instead of trying to be, "aha, got you" in this class, 580 00:28:08,780 --> 00:28:11,700 we are trying to give you a chance to actually learn. 581 00:28:11,700 --> 00:28:14,010 So if you missed it, OK, you missed it the first time. 582 00:28:14,010 --> 00:28:17,150 But you get the second chance to come and fix it and see 583 00:28:17,150 --> 00:28:18,560 whether you can do well. 584 00:28:18,560 --> 00:28:21,130 And of course, if you do better than the best grade, 585 00:28:21,130 --> 00:28:23,640 there are bragging rights, but you don't get extra points. 586 00:28:23,640 --> 00:28:27,960 587 00:28:27,960 --> 00:28:31,870 With that, let me-- 588 00:28:31,870 --> 00:28:33,120 where are we? 589 00:28:33,120 --> 00:28:36,610 590 00:28:36,610 --> 00:28:40,990 So again, as Charles pointed out, I'm pretty excited about 591 00:28:40,990 --> 00:28:41,390 this class. 592 00:28:41,390 --> 00:28:43,130 This is a really fun class. 593 00:28:43,130 --> 00:28:44,230 Unfortunately, I'm a little bit under the weather. 594 00:28:44,230 --> 00:28:46,030 So I'm going to sit down. 595 00:28:46,030 --> 00:28:52,230 And if I don't sound that enthusiastic, hopefully I'll 596 00:28:52,230 --> 00:28:53,390 get enthusiastic later. 597 00:28:53,390 --> 00:28:56,730 And that's one reason we ask you to come forward, because 598 00:28:56,730 --> 00:28:58,470 we don't have to come here and shout. 599 00:28:58,470 --> 00:29:00,400 So hopefully people in the back can here. 600 00:29:00,400 --> 00:29:02,890 If you don't, still, there are some seats in the front. 601 00:29:02,890 --> 00:29:05,290 You can move forward. 602 00:29:05,290 --> 00:29:09,090 So for the rest of the class, what we want to do is go 603 00:29:09,090 --> 00:29:13,180 through an example and look at why performance matters. 604 00:29:13,180 --> 00:29:16,270 605 00:29:16,270 --> 00:29:17,900 For that, I selected matrix multiply. 606 00:29:17,900 --> 00:29:20,890 And I think your Project 0, you are going to also play 607 00:29:20,890 --> 00:29:23,010 with matrix multiply. 608 00:29:23,010 --> 00:29:26,730 And I think that you'll get a feel for what kind of things 609 00:29:26,730 --> 00:29:31,730 will happened and how much room you have in 610 00:29:31,730 --> 00:29:33,640 doing well in here. 611 00:29:33,640 --> 00:29:36,260 And at the same time, as I go through this, you'll get a 612 00:29:36,260 --> 00:29:39,750 feel for what's the process that you might be following 613 00:29:39,750 --> 00:29:41,750 through the class and what kind of things-- 614 00:29:41,750 --> 00:29:46,100 because linear programming, after taking a bunch of 615 00:29:46,100 --> 00:29:48,780 programming classes, probably, you have a methodology set up 616 00:29:48,780 --> 00:29:51,190 in your mind, how to go about doing that. 617 00:29:51,190 --> 00:29:54,500 And the same thing after this class, we want you to have a 618 00:29:54,500 --> 00:29:57,580 methodology of going through and improving performance. 619 00:29:57,580 --> 00:30:02,590 And this is basically a little bit of a contrite example, but 620 00:30:02,590 --> 00:30:04,120 you'll get a feel. 621 00:30:04,120 --> 00:30:06,270 So what's matrix multiply? 622 00:30:06,270 --> 00:30:10,000 It's a very fundamental operation for many 623 00:30:10,000 --> 00:30:10,700 computations. 624 00:30:10,700 --> 00:30:13,940 So you are going to produce Matrix A by multiplying Matrix 625 00:30:13,940 --> 00:30:18,410 B and C. So you want to calculate each value in here 626 00:30:18,410 --> 00:30:22,330 to take a row of B and a column of C, and you get each 627 00:30:22,330 --> 00:30:27,150 element of A. And you do that for every element of A. And 628 00:30:27,150 --> 00:30:29,900 this is the possible simplest code you can write in a 629 00:30:29,900 --> 00:30:32,150 language like C, a [? three-loop nest, ?] 630 00:30:32,150 --> 00:30:34,980 and this matrix multiply. 631 00:30:34,980 --> 00:30:38,310 So, so far, things should be pretty simple. 632 00:30:38,310 --> 00:30:42,030 So what I did was I looked at matrix multiply and said, 633 00:30:42,030 --> 00:30:45,820 look, you have taken software engineering class. 634 00:30:45,820 --> 00:30:48,660 You have thought about object-oriented programming, 635 00:30:48,660 --> 00:30:51,160 lot of nice techniques to build very 636 00:30:51,160 --> 00:30:52,720 useful, portable software. 637 00:30:52,720 --> 00:30:55,100 And let's try to write matrix multiply with 638 00:30:55,100 --> 00:30:57,090 some of those concepts. 639 00:30:57,090 --> 00:31:00,310 So the first thing I want to do is matrix multiply write 640 00:31:00,310 --> 00:31:04,640 object oriented using immutable classes, and that, 641 00:31:04,640 --> 00:31:06,730 in fact, I want matrix class to represent both 642 00:31:06,730 --> 00:31:08,470 integers and doubles. 643 00:31:08,470 --> 00:31:11,200 So here's my class structuring here. 644 00:31:11,200 --> 00:31:14,680 And actually, I'm going to show you a bunch of code now, 645 00:31:14,680 --> 00:31:15,490 next couple of slides. 646 00:31:15,490 --> 00:31:17,540 I'm not expecting you to follow everything 647 00:31:17,540 --> 00:31:18,350 instantaneously. 648 00:31:18,350 --> 00:31:20,140 I'm going to go through somewhat faster. 649 00:31:20,140 --> 00:31:21,570 But you have the slides. 650 00:31:21,570 --> 00:31:25,140 So I always expect you to, perhaps, in some free time, to 651 00:31:25,140 --> 00:31:27,590 go and look at some of the slides, get a little bit of 652 00:31:27,590 --> 00:31:28,990 better feel for what happened. 653 00:31:28,990 --> 00:31:32,590 So here's what matrix multiply does. 654 00:31:32,590 --> 00:31:38,750 I have a value class that figures out the matrix type. 655 00:31:38,750 --> 00:31:40,300 And if it's integer, it keep integers. 656 00:31:40,300 --> 00:31:42,770 If it's a double, I keep a double. 657 00:31:42,770 --> 00:31:46,190 And here, basically, do this value in here. 658 00:31:46,190 --> 00:31:49,300 And then I have a matrix class. 659 00:31:49,300 --> 00:31:53,820 Basically, given a matrix type and a matrix, it can create a 660 00:31:53,820 --> 00:31:54,650 matrix here. 661 00:31:54,650 --> 00:32:01,930 This basically creates your matrix in here by first 662 00:32:01,930 --> 00:32:06,050 creating a bunch of rows here, and then instantiating each 663 00:32:06,050 --> 00:32:09,280 row in here, whether it's integer or double, depending 664 00:32:09,280 --> 00:32:11,340 on the type you want. 665 00:32:11,340 --> 00:32:13,180 And then, of course, there are other things in matrix, 666 00:32:13,180 --> 00:32:15,590 basically, in here. 667 00:32:15,590 --> 00:32:17,210 And you can update a matrix. 668 00:32:17,210 --> 00:32:18,590 This is going to be a little bit interesting. 669 00:32:18,590 --> 00:32:21,290 I will give a little bit here, because what I'm going is I'm 670 00:32:21,290 --> 00:32:22,380 doing immutable types. 671 00:32:22,380 --> 00:32:24,390 That means I can't change anything in the matrix. 672 00:32:24,390 --> 00:32:27,780 If you are updating, you have to create a new matrix. 673 00:32:27,780 --> 00:32:29,140 So I'll show a little bit about how to 674 00:32:29,140 --> 00:32:30,040 go about doing that. 675 00:32:30,040 --> 00:32:33,060 And then there's this row class. 676 00:32:33,060 --> 00:32:34,590 I'm going through this fast, because 677 00:32:34,590 --> 00:32:35,840 it's not that important. 678 00:32:35,840 --> 00:32:39,820 679 00:32:39,820 --> 00:32:42,730 If you get a chance, go through this and get a feel 680 00:32:42,730 --> 00:32:43,710 for what I'm doing. 681 00:32:43,710 --> 00:32:48,565 But the idea is doing this very software engineering way 682 00:32:48,565 --> 00:32:51,270 of implementing a matrix multiply. 683 00:32:51,270 --> 00:32:53,880 So then here's my matrix multiply in here, that what 684 00:32:53,880 --> 00:32:56,550 I'm doing is I'm updating each of these matrices by giving a 685 00:32:56,550 --> 00:33:01,340 value of B and C and adding it to A and 686 00:33:01,340 --> 00:33:03,960 updating A again in here. 687 00:33:03,960 --> 00:33:09,370 So when I run matrix multiply, 1,024 by 1,024 matrix, this is 688 00:33:09,370 --> 00:33:10,606 the time I got. 689 00:33:10,606 --> 00:33:13,520 Actually, it took a long time, in fact. 690 00:33:13,520 --> 00:33:17,410 And the key thing, is the performance good? 691 00:33:17,410 --> 00:33:19,240 That's the first question to ask. 692 00:33:19,240 --> 00:33:22,190 Because you run, you get a result. 693 00:33:22,190 --> 00:33:23,240 When the result is correct, OK, great. 694 00:33:23,240 --> 00:33:24,410 You get the correct result. 695 00:33:24,410 --> 00:33:25,730 But how do you do? 696 00:33:25,730 --> 00:33:28,470 And this is sometimes a very hard thing, to know that you 697 00:33:28,470 --> 00:33:32,630 have a problem is itself a huge win-- 698 00:33:32,630 --> 00:33:33,760 that you figure out you have a problem. 699 00:33:33,760 --> 00:33:36,660 And what I'm going to do is go through a little bit of a 700 00:33:36,660 --> 00:33:39,510 calculation to see how we did it. 701 00:33:39,510 --> 00:33:43,200 It took about five hours to do this matrix multiply, and that 702 00:33:43,200 --> 00:33:45,380 doesn't look right already. 703 00:33:45,380 --> 00:33:47,570 So if you think about matrix multiply, it's an N-cube 704 00:33:47,570 --> 00:33:51,640 operation, basically, what matrix multiply is doing. 705 00:33:51,640 --> 00:33:53,150 That's the algorithmic [? class. ?] 706 00:33:53,150 --> 00:33:56,330 If you look at Charles' book, that's what it would say. 707 00:33:56,330 --> 00:33:57,230 It's N-cube operation. 708 00:33:57,230 --> 00:34:00,700 So this is the number of operations you want to do. 709 00:34:00,700 --> 00:34:04,820 And for each matrix operation, you only do three index 710 00:34:04,820 --> 00:34:09,710 updaters, add, a multiply, and a check-- 711 00:34:09,710 --> 00:34:12,310 probably six ops to do, basically. 712 00:34:12,310 --> 00:34:15,929 And that means I need to do this many operations. 713 00:34:15,929 --> 00:34:19,670 Basically, you can think about machine operations to do, 714 00:34:19,670 --> 00:34:23,659 basically, because you have to read three values, update one, 715 00:34:23,659 --> 00:34:28,260 do a multiply and add, and do a branch. 716 00:34:28,260 --> 00:34:31,949 And if I look at that with my hours, I am doing this many 717 00:34:31,949 --> 00:34:33,929 operations per second, which looks high. 718 00:34:33,929 --> 00:34:36,920 Big number. 719 00:34:36,920 --> 00:34:41,310 But my machine, actually, on my somewhat old PC, transfers 720 00:34:41,310 --> 00:34:42,239 even much faster. 721 00:34:42,239 --> 00:34:48,500 So if you think about it, I am taking about 8,300 cycles to 722 00:34:48,500 --> 00:34:50,570 do any kind of a machine operation. 723 00:34:50,570 --> 00:34:51,880 This doesn't look right. 724 00:34:51,880 --> 00:34:55,110 725 00:34:55,110 --> 00:34:58,620 So at this point, this is what I would call a "backup 726 00:34:58,620 --> 00:35:02,100 [UNINTELLIGIBLE] calculation." I haven't proven anything. 727 00:35:02,100 --> 00:35:05,180 This one worked well in the theory class, but if you can 728 00:35:05,180 --> 00:35:07,845 just jot down a couple of these things, you can get a 729 00:35:07,845 --> 00:35:10,740 feel for where you stand very fast. 730 00:35:10,740 --> 00:35:13,530 And then you realize, OK, I am a couple of orders of 731 00:35:13,530 --> 00:35:15,870 magnitude off here on what best I could do. 732 00:35:15,870 --> 00:35:18,630 And that should give you a feel for what's going on. 733 00:35:18,630 --> 00:35:21,250 OK, so what's going on here? 734 00:35:21,250 --> 00:35:22,410 How can we improve performance? 735 00:35:22,410 --> 00:35:24,510 Because we know we have a problem. 736 00:35:24,510 --> 00:35:26,850 So what we can do is we can look deeply 737 00:35:26,850 --> 00:35:28,370 into the program execution. 738 00:35:28,370 --> 00:35:30,570 And at some points, OK, this program runs slow. 739 00:35:30,570 --> 00:35:31,070 [UNINTELLIGIBLE] 740 00:35:31,070 --> 00:35:33,190 time spent. 741 00:35:33,190 --> 00:35:37,270 So we can figure out time spent, basically, things like 742 00:35:37,270 --> 00:35:38,580 by method-- 743 00:35:38,580 --> 00:35:41,300 which method you spend most of the time running. 744 00:35:41,300 --> 00:35:42,740 Sometimes by line. 745 00:35:42,740 --> 00:35:45,790 And we will learn a bunch of profiling tools that will give 746 00:35:45,790 --> 00:35:46,580 you this feel. 747 00:35:46,580 --> 00:35:48,780 So instead of reading a million lines of code, looking 748 00:35:48,780 --> 00:35:52,590 at every line in there, you can very fast converge onto 749 00:35:52,590 --> 00:35:55,120 the culprits where most of the time being spent. 750 00:35:55,120 --> 00:35:58,640 And that's a very easy way of triaging your problem. 751 00:35:58,640 --> 00:36:03,620 And you look at things like time spent, cumulative time 752 00:36:03,620 --> 00:36:06,440 spent within that and everything called underneath, 753 00:36:06,440 --> 00:36:08,540 number of invocations of the functions. 754 00:36:08,540 --> 00:36:12,130 And a lot times, you have a feel for what it should be. 755 00:36:12,130 --> 00:36:14,720 And when you look at this, if it doesn't look right, you 756 00:36:14,720 --> 00:36:17,030 say, wait a minute, this goes against my intuition. 757 00:36:17,030 --> 00:36:19,320 Perhaps there's a problem. 758 00:36:19,320 --> 00:36:23,690 So what you can do is, by doing this, you can 759 00:36:23,690 --> 00:36:24,930 identify hot spots. 760 00:36:24,930 --> 00:36:27,910 Basically, if 90% of the time it's it one routine, that's 761 00:36:27,910 --> 00:36:29,870 what you want to fix. 762 00:36:29,870 --> 00:36:34,220 And hopefully, that routine is doing something too slow. 763 00:36:34,220 --> 00:36:38,080 Then you can actually get good performance in here. 764 00:36:38,080 --> 00:36:43,040 So here's the profile I ran for a matrix multiply. 765 00:36:43,040 --> 00:36:46,040 And interestingly, what I found was, most of the time 766 00:36:46,040 --> 00:36:50,000 spent in basically creating doubles-- 767 00:36:50,000 --> 00:36:55,430 number of calls, time spent here, this much time was 768 00:36:55,430 --> 00:37:02,710 spending, basically creating new values in here, this 769 00:37:02,710 --> 00:37:03,960 initializing a double in here. 770 00:37:03,960 --> 00:37:07,940 771 00:37:07,940 --> 00:37:09,110 So I'm doing matrix multiply. 772 00:37:09,110 --> 00:37:11,360 Why should I be initializing values? 773 00:37:11,360 --> 00:37:12,510 So this is an issue. 774 00:37:12,510 --> 00:37:16,290 And I have the most number of calls to that too. 775 00:37:16,290 --> 00:37:18,530 So we can go look at it and say, hey, that would seem to 776 00:37:18,530 --> 00:37:19,960 be a problem here. 777 00:37:19,960 --> 00:37:22,580 So the problem here was kind of obvious. 778 00:37:22,580 --> 00:37:25,350 What happened was I didn't use immutable [? class. ?] 779 00:37:25,350 --> 00:37:27,090 So this is my matrix representation. 780 00:37:27,090 --> 00:37:31,360 I have a bunch of pointers pointing to each row. 781 00:37:31,360 --> 00:37:34,040 And now assume I won't update this element. 782 00:37:34,040 --> 00:37:36,320 And of course, I can't modify the entire thing. 783 00:37:36,320 --> 00:37:37,120 It's immutable. 784 00:37:37,120 --> 00:37:41,490 So what I had to do is create a copy that updates. 785 00:37:41,490 --> 00:37:43,580 And then, of course, I had to create a new matrix. 786 00:37:43,580 --> 00:37:46,600 I don't have to copy everybody else, because these guys were 787 00:37:46,600 --> 00:37:47,350 not changed. 788 00:37:47,350 --> 00:37:50,920 So I can place a pointer to these things and create this 789 00:37:50,920 --> 00:37:54,140 new matrix that points to these things. 790 00:37:54,140 --> 00:37:56,870 And then, if you think about what's going on, in fact, and 791 00:37:56,870 --> 00:38:00,705 this is my new matrix after I do that, I [? updated ?] two 792 00:38:00,705 --> 00:38:03,625 end copies for each update, because what I had to do is I 793 00:38:03,625 --> 00:38:06,280 had to copy this entire row up to get that. 794 00:38:06,280 --> 00:38:08,750 And I had to create this entire index again for each 795 00:38:08,750 --> 00:38:09,930 point in each row. 796 00:38:09,930 --> 00:38:12,890 So just to update one element, I am creating two end copies. 797 00:38:12,890 --> 00:38:16,300 So what that means is now our N cube matrix multiply ends up 798 00:38:16,300 --> 00:38:19,240 being N to the power of 4 algorithm. 799 00:38:19,240 --> 00:38:21,290 So suddenly, you realize, wait a minute, I'm doing something 800 00:38:21,290 --> 00:38:21,600 bad in here. 801 00:38:21,600 --> 00:38:22,340 I'm actually [UNINTELLIGIBLE] 802 00:38:22,340 --> 00:38:25,900 N-cube algorithm, and the way I saw, it becomes n to the 803 00:38:25,900 --> 00:38:28,600 power of 4 because I'm making all these copies in there. 804 00:38:28,600 --> 00:38:32,290 So this is a bad thing to do, of course. 805 00:38:32,290 --> 00:38:34,700 And copying is very costly, because you're making 806 00:38:34,700 --> 00:38:36,060 duplicates in here. 807 00:38:36,060 --> 00:38:38,430 And of course, you're creating a huge amount of garbage, 808 00:38:38,430 --> 00:38:41,660 because you are just copying and leaving the previous value 809 00:38:41,660 --> 00:38:42,570 for garbage collection. 810 00:38:42,570 --> 00:38:45,480 So garbage collector's coming up all the time, and huge 811 00:38:45,480 --> 00:38:46,180 memory footprint. 812 00:38:46,180 --> 00:38:48,280 So this is no good. 813 00:38:48,280 --> 00:38:50,220 And can we do better? 814 00:38:50,220 --> 00:38:51,590 Hopefully, we can. 815 00:38:51,590 --> 00:38:53,890 So the next thing we did was, OK, just get rid of these 816 00:38:53,890 --> 00:38:55,850 immutable types. 817 00:38:55,850 --> 00:38:58,410 That seemed to be a [UNINTELLIGIBLE] 818 00:38:58,410 --> 00:38:59,460 here. 819 00:38:59,460 --> 00:39:01,570 And then, I actually have-- 820 00:39:01,570 --> 00:39:03,610 I will go faster over the code, so if you are 821 00:39:03,610 --> 00:39:06,330 interested, you can go look at the code in here. 822 00:39:06,330 --> 00:39:12,420 So here, what we did was, still we have the issues like 823 00:39:12,420 --> 00:39:15,330 object oriented and ability to represent both 824 00:39:15,330 --> 00:39:16,850 integers and doubles. 825 00:39:16,850 --> 00:39:19,710 So what that means is each matrix can be the integer 826 00:39:19,710 --> 00:39:21,360 matrix or a double matrix. 827 00:39:21,360 --> 00:39:24,430 So this is where you're instantiating with integers or 828 00:39:24,430 --> 00:39:26,890 doubles in here. 829 00:39:26,890 --> 00:39:28,780 So we do that. 830 00:39:28,780 --> 00:39:31,670 And how much you think this can improve performance by 831 00:39:31,670 --> 00:39:33,070 doing that? 832 00:39:33,070 --> 00:39:34,680 In fact, this was pretty nice. 833 00:39:34,680 --> 00:39:37,810 I actually got a 219x performance improvement 834 00:39:37,810 --> 00:39:40,560 because I went from n to the power 4 algorithm to n to the 835 00:39:40,560 --> 00:39:41,590 power 3 algorithm. 836 00:39:41,590 --> 00:39:44,450 So here is interesting, algorithm has changed with it, 837 00:39:44,450 --> 00:39:48,810 and we have a pretty nice performance improvement. 838 00:39:48,810 --> 00:39:53,210 So if I look at this now, what I find is I'm spending a lot 839 00:39:53,210 --> 00:39:56,750 of method time, I'm doing matrix multiply. 840 00:39:56,750 --> 00:39:57,780 That's good. 841 00:39:57,780 --> 00:39:59,890 But I'm spending all this time doing get 842 00:39:59,890 --> 00:40:02,500 double and get methods. 843 00:40:02,500 --> 00:40:03,850 I'm trying to do matrix multiply. 844 00:40:03,850 --> 00:40:05,730 Why am I spending all this time trying to do 845 00:40:05,730 --> 00:40:08,630 get double and get? 846 00:40:08,630 --> 00:40:13,360 What I found was issue is the method call overhead. 847 00:40:13,360 --> 00:40:15,740 Now, what happened is matrix row has two different matrix 848 00:40:15,740 --> 00:40:17,690 types, the integer row and double row. 849 00:40:17,690 --> 00:40:22,870 So every time I want to get a row, I don't know if I have 850 00:40:22,870 --> 00:40:24,030 integer row or double row. 851 00:40:24,030 --> 00:40:26,800 I have to do a look-up to figure out my type. 852 00:40:26,800 --> 00:40:30,480 Am I working with the integer matrix or matrix in doubles? 853 00:40:30,480 --> 00:40:33,760 And this is what we call "dynamic dispatch." So instead 854 00:40:33,760 --> 00:40:36,460 of knowing exactly the methods we are to go, I had to 855 00:40:36,460 --> 00:40:38,380 actually look up, OK, which method I'm supposed to go. 856 00:40:38,380 --> 00:40:41,580 I have to do that since, if the matrix is integer, I might 857 00:40:41,580 --> 00:40:44,550 have to call a different method than matrix is double. 858 00:40:44,550 --> 00:40:45,480 I have to check that. 859 00:40:45,480 --> 00:40:47,890 And then that call, instead of direct call, 860 00:40:47,890 --> 00:40:49,320 becomes indirect branch. 861 00:40:49,320 --> 00:40:52,010 I have to check that and basically branch on that. 862 00:40:52,010 --> 00:40:54,290 And these indirect branches are costly, because what that 863 00:40:54,290 --> 00:40:58,260 means is I don't where I am going until I do that test. 864 00:40:58,260 --> 00:41:00,250 So no machine-- 865 00:41:00,250 --> 00:41:02,490 we'll go into details and pipelines and 866 00:41:02,490 --> 00:41:03,780 stuff as we go on. 867 00:41:03,780 --> 00:41:06,160 What that means is normally, a machine can have a lot of 868 00:41:06,160 --> 00:41:07,600 instruction in flight. 869 00:41:07,600 --> 00:41:11,130 So that means that I have to know a lot more information to 870 00:41:11,130 --> 00:41:12,560 get the instructions going. 871 00:41:12,560 --> 00:41:15,660 But here, until you resolve the instructions, you can't do 872 00:41:15,660 --> 00:41:16,580 the next one. 873 00:41:16,580 --> 00:41:18,870 So it's called a, basically, pipeline stall. 874 00:41:18,870 --> 00:41:21,150 So instead of having hundreds of instructions going, 875 00:41:21,150 --> 00:41:23,290 basically, suddenly I can't do anything until the previous 876 00:41:23,290 --> 00:41:24,720 instruction is done. 877 00:41:24,720 --> 00:41:29,850 So the machine slows down a lot because of this. 878 00:41:29,850 --> 00:41:33,210 879 00:41:33,210 --> 00:41:34,700 Normally, what you can do is keep fetching next 880 00:41:34,700 --> 00:41:35,440 instructions here. 881 00:41:35,440 --> 00:41:38,730 We have to fetch, test, then the test result is available, 882 00:41:38,730 --> 00:41:41,476 then that's when you can fetch the next instruction. 883 00:41:41,476 --> 00:41:44,860 884 00:41:44,860 --> 00:41:45,940 Let me go through that file. 885 00:41:45,940 --> 00:41:48,800 So direct batch means target address is known. 886 00:41:48,800 --> 00:41:50,970 You can fetch ahead of the targets/ If I know where I am 887 00:41:50,970 --> 00:41:53,660 going, even if I don't get there, I can fetch ahead of 888 00:41:53,660 --> 00:41:54,070 the target. 889 00:41:54,070 --> 00:41:58,970 I can go start fetching what happens next ahead, basically, 890 00:41:58,970 --> 00:42:00,650 into [? the branch. ?] 891 00:42:00,650 --> 00:42:03,540 Sometimes, you can even fetch both directions, or the 892 00:42:03,540 --> 00:42:05,520 direction you might always go might get 893 00:42:05,520 --> 00:42:06,380 fetched by the architect. 894 00:42:06,380 --> 00:42:09,060 We'll talk about a lot of architecture details in the 895 00:42:09,060 --> 00:42:09,970 future lectures. 896 00:42:09,970 --> 00:42:12,150 Indirect branch means until I calculate the address, I have 897 00:42:12,150 --> 00:42:13,850 no idea where I'm going. 898 00:42:13,850 --> 00:42:15,490 That address is only calculated. 899 00:42:15,490 --> 00:42:20,720 And because of that, hardware slows down like crazy. 900 00:42:20,720 --> 00:42:25,280 So this seems to be not a good thing to have these integers 901 00:42:25,280 --> 00:42:28,310 and doubles instead of having one nice class of just 902 00:42:28,310 --> 00:42:29,360 [UNINTELLIGIBLE] 903 00:42:29,360 --> 00:42:30,330 might be nice. 904 00:42:30,330 --> 00:42:32,580 I just basically create a double class. 905 00:42:32,580 --> 00:42:34,730 It's all matrices are doubles. 906 00:42:34,730 --> 00:42:37,310 If I want integers, I will just rewrite my matrix class 907 00:42:37,310 --> 00:42:39,840 or copy and replace. 908 00:42:39,840 --> 00:42:43,670 Now my matrix class becomes much more simpler. 909 00:42:43,670 --> 00:42:47,310 This is what I got, and this is my row in here. 910 00:42:47,310 --> 00:42:52,220 And voila, minute I do that, I basically went up about a 2.4x 911 00:42:52,220 --> 00:42:54,890 performance improvement, because now I don't have to 912 00:42:54,890 --> 00:42:56,510 keep doing that. 913 00:42:56,510 --> 00:42:58,610 And altogether now, I have improved my performance by 914 00:42:58,610 --> 00:43:00,850 about 500x here. 915 00:43:00,850 --> 00:43:03,270 And for ops per cycle, we have gone down from 916 00:43:03,270 --> 00:43:07,186 8,000 to 38 to 16. 917 00:43:07,186 --> 00:43:09,020 So we do still better. 918 00:43:09,020 --> 00:43:12,880 And now we look at, again, my profile data. 919 00:43:12,880 --> 00:43:16,690 And what I see is, OK, I am spending a lot of time in 920 00:43:16,690 --> 00:43:17,060 matrix multiply. 921 00:43:17,060 --> 00:43:18,310 That's OK. 922 00:43:18,310 --> 00:43:20,760 But also, I am spending time in this matrix get method. 923 00:43:20,760 --> 00:43:26,410 I am trying to get each, basically, row and data 924 00:43:26,410 --> 00:43:27,660 elements of the matrix. 925 00:43:27,660 --> 00:43:32,480 926 00:43:32,480 --> 00:43:35,280 So here is the summary of these things. 927 00:43:35,280 --> 00:43:37,000 I'm going to just keep that one, because I gave you 928 00:43:37,000 --> 00:43:38,370 [INAUDIBLE]. 929 00:43:38,370 --> 00:43:41,500 So the problem in this object-orientedness is I 930 00:43:41,500 --> 00:43:43,560 [? handle ?] each row is represented separately-- 931 00:43:43,560 --> 00:43:45,380 nice objects. 932 00:43:45,380 --> 00:43:47,300 These objects overload the memory. 933 00:43:47,300 --> 00:43:50,130 And in fact, if I want to get to my row, I have to appoint a 934 00:43:50,130 --> 00:43:50,650 [? chase ?] 935 00:43:50,650 --> 00:43:51,610 to go get that. 936 00:43:51,610 --> 00:43:53,380 It's not contiguous in memory. 937 00:43:53,380 --> 00:43:55,970 And to get to every row, I have to actually go through 938 00:43:55,970 --> 00:43:58,270 this [? point ?] indexing in here. 939 00:43:58,270 --> 00:44:00,990 [UNINTELLIGIBLE] matrices are one, nice, simple objects. 940 00:44:00,990 --> 00:44:04,170 So we can say, OK, wait a minute. 941 00:44:04,170 --> 00:44:06,470 This method call overhead actually keeps dominating. 942 00:44:06,470 --> 00:44:09,195 And because I have done this nice object-oriented thing in 943 00:44:09,195 --> 00:44:14,270 here, so I can get to the object-oriented file here and 944 00:44:14,270 --> 00:44:15,800 say, OK, get to the objects. 945 00:44:15,800 --> 00:44:23,380 And I write this nice loop of [UNINTELLIGIBLE]. 946 00:44:23,380 --> 00:44:28,210 And I have two-dimensional matrices in here. 947 00:44:28,210 --> 00:44:30,670 And voila, I got another 2.2x performance 948 00:44:30,670 --> 00:44:32,230 improvement in here. 949 00:44:32,230 --> 00:44:35,400 OK, now I am down to about seven cycles for each 950 00:44:35,400 --> 00:44:36,510 operation in here. 951 00:44:36,510 --> 00:44:38,300 That's good. 952 00:44:38,300 --> 00:44:39,550 So what can we do here? 953 00:44:39,550 --> 00:44:42,750 So we did all of these things in Java, because it's supposed 954 00:44:42,750 --> 00:44:44,030 to be a nice language. 955 00:44:44,030 --> 00:44:45,510 But Java has a lot of overhead. 956 00:44:45,510 --> 00:44:47,900 It's like you do memory-bound checks. 957 00:44:47,900 --> 00:44:49,110 You have to do this bytecode interpreter 958 00:44:49,110 --> 00:44:50,700 and stuff like that. 959 00:44:50,700 --> 00:44:54,730 On the other hand, C can run much faster. 960 00:44:54,730 --> 00:44:56,170 You don't have to do any kind of memory-bound checks. 961 00:44:56,170 --> 00:44:59,740 The compilation happens before you run, so it is not part of 962 00:44:59,740 --> 00:45:02,430 your cost of doing anything. 963 00:45:02,430 --> 00:45:05,710 So this can work very much like C. You can just almost 964 00:45:05,710 --> 00:45:06,690 cut and paste. 965 00:45:06,690 --> 00:45:08,260 And if you know, C is [? index chain, ?] 966 00:45:08,260 --> 00:45:08,510 [? and this ?] 967 00:45:08,510 --> 00:45:10,560 to C is just changing a few lines. 968 00:45:10,560 --> 00:45:15,320 And I did that, and this is, basically, the C code in here. 969 00:45:15,320 --> 00:45:16,580 This is the matrix multiply. 970 00:45:16,580 --> 00:45:18,055 This is the kind of allocation in here. 971 00:45:18,055 --> 00:45:19,990 C can be a little bit painful to allocate. 972 00:45:19,990 --> 00:45:21,360 This is my matrix multiply. 973 00:45:21,360 --> 00:45:23,232 It will look similar. 974 00:45:23,232 --> 00:45:27,900 And I got 2.1x just by going from Java to C. This, I was 975 00:45:27,900 --> 00:45:30,280 surprised, because that piece of Java code looked very much 976 00:45:30,280 --> 00:45:33,700 close to C. But by just doing that, I actually got another 977 00:45:33,700 --> 00:45:35,320 2.1x performance gain. 978 00:45:35,320 --> 00:45:39,460 979 00:45:39,460 --> 00:45:41,545 Now this is kind of the high-level stuff. 980 00:45:41,545 --> 00:45:42,930 Now we have to go to nitty-gritty 981 00:45:42,930 --> 00:45:44,940 details in the hardware. 982 00:45:44,940 --> 00:45:47,890 So in modern hardware, there are these things called 983 00:45:47,890 --> 00:45:52,130 "performance counters" that hardware [UNINTELLIGIBLE] 984 00:45:52,130 --> 00:45:54,640 that we can look at what's happening inside hardware. 985 00:45:54,640 --> 00:45:58,910 We can get a feel for what happens inside hardware. 986 00:45:58,910 --> 00:46:03,010 And there are things called CPI, which clock cycles per 987 00:46:03,010 --> 00:46:04,060 instruction. 988 00:46:04,060 --> 00:46:06,590 So that means if the instructions are running very 989 00:46:06,590 --> 00:46:09,420 slow, you will have multiple clock cycles per instructions. 990 00:46:09,420 --> 00:46:10,690 That means instructions are stalling. 991 00:46:10,690 --> 00:46:12,400 Something is wrong. 992 00:46:12,400 --> 00:46:13,820 Cache misses. 993 00:46:13,820 --> 00:46:16,150 So data [UNINTELLIGIBLE] 994 00:46:16,150 --> 00:46:16,690 [? pay attention to ?] 995 00:46:16,690 --> 00:46:17,670 caches and stuff like that. 996 00:46:17,670 --> 00:46:20,870 If data are supposed to be in cache and have cache hits all 997 00:46:20,870 --> 00:46:22,490 the time, but if you are getting cache misses, 998 00:46:22,490 --> 00:46:23,490 something is wrong. 999 00:46:23,490 --> 00:46:24,920 Accesses are not good. 1000 00:46:24,920 --> 00:46:27,170 And then also the instructions retired. 1001 00:46:27,170 --> 00:46:29,400 That means, how many instructions are being 1002 00:46:29,400 --> 00:46:30,860 executed and retired? 1003 00:46:30,860 --> 00:46:34,630 If you are doing too much work, the number of 1004 00:46:34,630 --> 00:46:37,820 instructions retired will go high in here. 1005 00:46:37,820 --> 00:46:39,270 This will give you a few how many [? work been done. ?] 1006 00:46:39,270 --> 00:46:42,090 1007 00:46:42,090 --> 00:46:46,160 So if you look at this one, I have a CPI of 4. 1008 00:46:46,160 --> 00:46:51,350 That means for every instruction, need about almost 1009 00:46:51,350 --> 00:46:53,710 close to five clock cycles, which is not good. 1010 00:46:53,710 --> 00:46:55,620 I have pretty high cache miss rate, too-- 1011 00:46:55,620 --> 00:46:59,140 L1 cache miss rate and some L2 cache miss rate. 1012 00:46:59,140 --> 00:47:01,740 This shows how many instruction are in what they 1013 00:47:01,740 --> 00:47:04,360 call [? vector in the ?] instruction form called SSE 1014 00:47:04,360 --> 00:47:05,360 instructions. 1015 00:47:05,360 --> 00:47:07,540 And here's the number of instructions got retired. 1016 00:47:07,540 --> 00:47:11,240 So we just use that baseline case. 1017 00:47:11,240 --> 00:47:14,750 So one thing I can look at was my cache miss rates seem to be 1018 00:47:14,750 --> 00:47:16,130 pretty darn high. 1019 00:47:16,130 --> 00:47:17,940 What might be happening here? 1020 00:47:17,940 --> 00:47:19,920 So if you look at matrix multiply-- 1021 00:47:19,920 --> 00:47:23,480 so I have Matrix A in here and Matrix B-- 1022 00:47:23,480 --> 00:47:26,080 what happens is, to calculate this value in [? Matrix A ?] 1023 00:47:26,080 --> 00:47:32,540 here, basically, I'm going to go through this row in here, 1024 00:47:32,540 --> 00:47:34,540 and I'm going through column in here. 1025 00:47:34,540 --> 00:47:36,500 Problem with the column is it's all [? low in ?] 1026 00:47:36,500 --> 00:47:37,990 the cache. 1027 00:47:37,990 --> 00:47:39,860 This is your linear memory. 1028 00:47:39,860 --> 00:47:44,010 So because this is here two dimensional, but inside your 1029 00:47:44,010 --> 00:47:45,550 hardware, there's no two-dimensional memory. 1030 00:47:45,550 --> 00:47:47,810 You have a linear memory, so this is your linear memory. 1031 00:47:47,810 --> 00:47:49,190 This is how the memory look like. 1032 00:47:49,190 --> 00:47:50,530 And this is how the memory look like. 1033 00:47:50,530 --> 00:47:52,520 So each row and column is in these areas. 1034 00:47:52,520 --> 00:47:55,740 But here, to calculate this value, this is nice. 1035 00:47:55,740 --> 00:47:58,860 I go through memory that is contiguous. 1036 00:47:58,860 --> 00:48:02,120 But C, I am jumping all over my memory. 1037 00:48:02,120 --> 00:48:08,230 And jumping on memory is not that great cache behavior. 1038 00:48:08,230 --> 00:48:12,280 And contiguous can do much better, because what you do 1039 00:48:12,280 --> 00:48:15,150 is, when you get a cache, you get a cache line that has more 1040 00:48:15,150 --> 00:48:16,110 than one word. 1041 00:48:16,110 --> 00:48:18,090 So in here, what you're doing is you are getting large cache 1042 00:48:18,090 --> 00:48:21,130 lines only using one word before you go to the next one. 1043 00:48:21,130 --> 00:48:23,550 So that's not great in cache behavior. 1044 00:48:23,550 --> 00:48:25,710 So what you can do is you can make this memory contiguous. 1045 00:48:25,710 --> 00:48:29,650 1046 00:48:29,650 --> 00:48:32,765 So what you can do is, in matrix multiply, normally, n 1047 00:48:32,765 --> 00:48:35,390 to the cubed computation to the squared data. 1048 00:48:35,390 --> 00:48:39,530 So to help on n to the cubed, it's OK to do some n to the 1049 00:48:39,530 --> 00:48:40,350 squared processing. 1050 00:48:40,350 --> 00:48:44,130 That means you can do a data transport before you start 1051 00:48:44,130 --> 00:48:45,140 matrix multiply. 1052 00:48:45,140 --> 00:48:46,690 Do some pre-processing. 1053 00:48:46,690 --> 00:48:48,080 So this will also come in handy. 1054 00:48:48,080 --> 00:48:50,390 A lot of these problems you'll do, you'll say, wait a minute, 1055 00:48:50,390 --> 00:48:53,750 can I do a pre-process that's cheaper that will really 1056 00:48:53,750 --> 00:48:55,870 improve my main processing? 1057 00:48:55,870 --> 00:48:58,850 This is a theme that comes again and again. 1058 00:48:58,850 --> 00:49:01,410 And then we can get this n to the cubed running faster since 1059 00:49:01,410 --> 00:49:03,820 this n to the cubed is much larger than n to the squared. 1060 00:49:03,820 --> 00:49:06,980 You can amortize the cost of doing that. 1061 00:49:06,980 --> 00:49:10,970 And basically, to get that, we can transpose some matrix 1062 00:49:10,970 --> 00:49:12,270 going into the squared operations. 1063 00:49:12,270 --> 00:49:15,530 And then now n to the cubed might run much faster. 1064 00:49:15,530 --> 00:49:18,160 And in fact, what I have done here is do the transpose of 1065 00:49:18,160 --> 00:49:18,850 this matrix. 1066 00:49:18,850 --> 00:49:21,280 And now we run this one, and then everything will be in 1067 00:49:21,280 --> 00:49:23,040 contiguous memory. 1068 00:49:23,040 --> 00:49:26,290 And voila, I got another 3.4x performance improvement. 1069 00:49:26,290 --> 00:49:29,440 Now I am running almost at one cycle per op, 1070 00:49:29,440 --> 00:49:30,760 which is really nice. 1071 00:49:30,760 --> 00:49:33,295 But modern superscalars actually can do better, so we 1072 00:49:33,295 --> 00:49:35,100 can actually end up being even better. 1073 00:49:35,100 --> 00:49:35,890 So let's see. 1074 00:49:35,890 --> 00:49:37,790 So we are doing that. 1075 00:49:37,790 --> 00:49:39,030 So here is the interesting thing. 1076 00:49:39,030 --> 00:49:42,540 Now I look at my transpose [? spot. ?] 1077 00:49:42,540 --> 00:49:45,120 So if you look at that number of retired instructions, it's 1078 00:49:45,120 --> 00:49:47,450 more or less the same. 1079 00:49:47,450 --> 00:49:50,590 Number of SSE instructions, more or less the same. 1080 00:49:50,590 --> 00:49:55,960 I have a 2x improvement in my L1 cache miss rate. 1081 00:49:55,960 --> 00:50:00,090 But more than that, I have a 5x improvement in the CPI. 1082 00:50:00,090 --> 00:50:03,550 That means my instructions are stalling a lot 1083 00:50:03,550 --> 00:50:06,430 less now than before. 1084 00:50:06,430 --> 00:50:08,330 So this is where I got all my performance [UNINTELLIGIBLE], 1085 00:50:08,330 --> 00:50:11,220 because my instructions are running much faster. 1086 00:50:11,220 --> 00:50:13,260 So by looking at data-- 1087 00:50:13,260 --> 00:50:14,090 this is the kind of things you're 1088 00:50:14,090 --> 00:50:15,850 learning during the class-- 1089 00:50:15,850 --> 00:50:18,520 then you can get a feel for what actually is happening and 1090 00:50:18,520 --> 00:50:21,880 why you are getting that. 1091 00:50:21,880 --> 00:50:24,730 You can go do more in the cache, in memory system. 1092 00:50:24,730 --> 00:50:27,640 So remember, the memory system, if you're 1093 00:50:27,640 --> 00:50:30,190 [UNINTELLIGIBLE], if you look at cache, what they're doing 1094 00:50:30,190 --> 00:50:33,130 is we have a small amount of memory called cache that are 1095 00:50:33,130 --> 00:50:34,280 very fast access. 1096 00:50:34,280 --> 00:50:37,420 And you have a large amount of memory that is sitting out 1097 00:50:37,420 --> 00:50:39,260 that has slow access. 1098 00:50:39,260 --> 00:50:42,040 So what you want to give user feel as if they have a lot of 1099 00:50:42,040 --> 00:50:44,460 memory, everything is very fast access. 1100 00:50:44,460 --> 00:50:47,070 So hardware does a lot of crazy things to give you the 1101 00:50:47,070 --> 00:50:50,940 illusion that you have very fast memory and you have a 1102 00:50:50,940 --> 00:50:52,340 huge amount of that. 1103 00:50:52,340 --> 00:50:54,740 Of course, that's not true, because you have these caches. 1104 00:50:54,740 --> 00:50:57,070 And what happens is, if you do things wrong, 1105 00:50:57,070 --> 00:50:59,150 these get to slow down. 1106 00:50:59,150 --> 00:51:02,070 So in the cache system, you have cache [UNINTELLIGIBLE] 1107 00:51:02,070 --> 00:51:06,610 You have each multicore going to its own L1 cache that goes 1108 00:51:06,610 --> 00:51:09,670 L2 cache, which goes to L3 cache, which goes to the main 1109 00:51:09,670 --> 00:51:11,150 memory system. 1110 00:51:11,150 --> 00:51:13,890 And you want all of the data to be here. 1111 00:51:13,890 --> 00:51:15,640 If that's the case, things run very fast. 1112 00:51:15,640 --> 00:51:16,670 If not, you have to go here. 1113 00:51:16,670 --> 00:51:18,470 It will be a lot more slow, here, even 1114 00:51:18,470 --> 00:51:20,460 slower, here, very slow. 1115 00:51:20,460 --> 00:51:26,600 So the key thing is to get the data as much as possible here. 1116 00:51:26,600 --> 00:51:28,400 So here's the kind of cycles, basically. 1117 00:51:28,400 --> 00:51:30,470 One cycle to get here-- three cycles. 1118 00:51:30,470 --> 00:51:32,760 If you go to L2 to L3, you have to do 14 cycles 1119 00:51:32,760 --> 00:51:33,490 [UNINTELLIGIBLE]. 1120 00:51:33,490 --> 00:51:36,440 If you go to L3 to main memory, you might sometimes 1121 00:51:36,440 --> 00:51:38,830 get hundreds of cycles delaying here, 1122 00:51:38,830 --> 00:51:40,080 so it's very costly. 1123 00:51:40,080 --> 00:51:41,990 1124 00:51:41,990 --> 00:51:44,800 So the caches are very temperamental beasts. 1125 00:51:44,800 --> 00:51:48,190 They work beautifully, but when they don't work, these go 1126 00:51:48,190 --> 00:51:48,880 really haywire. 1127 00:51:48,880 --> 00:51:52,240 So a lot of times, if there's a performance issue, look at 1128 00:51:52,240 --> 00:51:53,550 the cache miss rate and say, aha, the 1129 00:51:53,550 --> 00:51:54,830 caches are not behaving. 1130 00:51:54,830 --> 00:51:57,410 And can I help in here? 1131 00:51:57,410 --> 00:51:59,210 So the interesting way to look at that is 1132 00:51:59,210 --> 00:52:01,460 how much data I touch. 1133 00:52:01,460 --> 00:52:03,350 These graphics didn't show up. 1134 00:52:03,350 --> 00:52:06,190 1135 00:52:06,190 --> 00:52:07,130 Let me explain. 1136 00:52:07,130 --> 00:52:13,070 So if you do matrix multiply, I am trying to calculate one 1137 00:52:13,070 --> 00:52:14,320 value in here. 1138 00:52:14,320 --> 00:52:19,570 1139 00:52:19,570 --> 00:52:20,880 So this is one value. 1140 00:52:20,880 --> 00:52:22,480 I have to calculate row and the column. 1141 00:52:22,480 --> 00:52:25,130 1142 00:52:25,130 --> 00:52:27,020 This slide got screwed up a little bit. 1143 00:52:27,020 --> 00:52:29,905 So just look at this calculating [? row bar. ?] 1144 00:52:29,905 --> 00:52:33,340 So if I want to calculate a row of values here, what I 1145 00:52:33,340 --> 00:52:36,570 have to do is I have to get this row in B, and I have to 1146 00:52:36,570 --> 00:52:41,510 use entire C to calculate one row of A. That means I have 1147 00:52:41,510 --> 00:52:45,700 touched this much data items just to get one row of A. So 1148 00:52:45,700 --> 00:52:49,000 that means I am calculating 1,024 values of A. I am 1149 00:52:49,000 --> 00:52:52,370 touching this much data items to get that. 1150 00:52:52,370 --> 00:52:54,920 On the other hand, if I do block matrix multiply, that 1151 00:52:54,920 --> 00:52:58,260 means I calculate, basically, a block in here. 1152 00:52:58,260 --> 00:52:59,940 This block is also 32 by 32. 1153 00:52:59,940 --> 00:53:03,870 It has 1,024 elements in here. 1154 00:53:03,870 --> 00:53:06,230 But I only need a block here and block here. 1155 00:53:06,230 --> 00:53:09,780 Voila, I only have looked at 25,000 elements. 1156 00:53:09,780 --> 00:53:13,770 I got the same number of output calculated. 1157 00:53:13,770 --> 00:53:16,830 But I have only touched much less elements altogether to 1158 00:53:16,830 --> 00:53:19,700 calculate that. 1159 00:53:19,700 --> 00:53:20,230 Do you see that? 1160 00:53:20,230 --> 00:53:24,110 And nice thing is, if this thing fits in the cache, I am 1161 00:53:24,110 --> 00:53:26,470 much better than this might not fit in the cache. 1162 00:53:26,470 --> 00:53:30,510 So I can calculate blocks in matrix multiply by looking at 1163 00:53:30,510 --> 00:53:33,965 a lot less number of data, that has a good chance at 1164 00:53:33,965 --> 00:53:35,870 fitting in cache-- and of course, when you calculate. 1165 00:53:35,870 --> 00:53:37,260 you make sure it fits in cache-- 1166 00:53:37,260 --> 00:53:40,610 than trying to just calculate row by row. 1167 00:53:40,610 --> 00:53:43,140 So one thing you can do is-- 1168 00:53:43,140 --> 00:53:46,950 this is a lot of common things we do. 1169 00:53:46,950 --> 00:53:48,680 There's many ways to get same results. 1170 00:53:48,680 --> 00:53:50,950 You can change the execution order. 1171 00:53:50,950 --> 00:53:51,710 You can change the algorithm. 1172 00:53:51,710 --> 00:53:53,190 You contain data structures. 1173 00:53:53,190 --> 00:53:55,340 And some of them actually might do 1174 00:53:55,340 --> 00:53:56,080 better than the other. 1175 00:53:56,080 --> 00:53:58,285 And here, we can actually say we can do the execution of the 1176 00:53:58,285 --> 00:54:01,790 change and make sure we are only looking at small amount 1177 00:54:01,790 --> 00:54:05,030 of data, you get the same number of data calculated. 1178 00:54:05,030 --> 00:54:10,350 And of course, sometimes you have to be careful, because if 1179 00:54:10,350 --> 00:54:12,590 you do things in different order, you might not get the 1180 00:54:12,590 --> 00:54:13,840 exact same results. 1181 00:54:13,840 --> 00:54:17,260 1182 00:54:17,260 --> 00:54:20,400 So if you are a numerical [UNINTELLIGIBLE], you say, OK, 1183 00:54:20,400 --> 00:54:22,940 of course, if you're doing A plus B plus C, what order you 1184 00:54:22,940 --> 00:54:26,330 do will make a difference [UNINTELLIGIBLE]. 1185 00:54:26,330 --> 00:54:28,420 So if you're really careful, you can't do too many things. 1186 00:54:28,420 --> 00:54:32,250 But most times, you can do some of these things. 1187 00:54:32,250 --> 00:54:35,070 And in fact, in our final thing, we might say-- 1188 00:54:35,070 --> 00:54:35,920 the final project-- 1189 00:54:35,920 --> 00:54:38,580 we will give you something that you can change the 1190 00:54:38,580 --> 00:54:41,920 algorithm as long as the outcome of the [? image ?] 1191 00:54:41,920 --> 00:54:43,206 is the same. 1192 00:54:43,206 --> 00:54:44,910 If you look at the [? image, ?] and the 1193 00:54:44,910 --> 00:54:46,820 [? image ?] outcome is the same, you 1194 00:54:46,820 --> 00:54:47,910 can go change things. 1195 00:54:47,910 --> 00:54:52,020 So a lot of times, you might have a lot of freedom to 1196 00:54:52,020 --> 00:54:57,560 change the amount of errors you get, the precision, and, 1197 00:54:57,560 --> 00:55:00,780 in fact, doing that, that you can actually get a lot of good 1198 00:55:00,780 --> 00:55:01,810 performance. 1199 00:55:01,810 --> 00:55:03,930 So in here, we are not trying to change the matrix 1200 00:55:03,930 --> 00:55:04,690 multiply, of course. 1201 00:55:04,690 --> 00:55:07,750 But we are going to change the way order of operations is 1202 00:55:07,750 --> 00:55:08,160 being done. 1203 00:55:08,160 --> 00:55:10,310 So some people say you can't do that. 1204 00:55:10,310 --> 00:55:12,040 It would make matrix multiply different. 1205 00:55:12,040 --> 00:55:14,420 But in most cases, it's OK. 1206 00:55:14,420 --> 00:55:17,690 So what we have done is done what we call 1207 00:55:17,690 --> 00:55:18,800 block matrix multiply. 1208 00:55:18,800 --> 00:55:20,240 So instead of calculating rows, you 1209 00:55:20,240 --> 00:55:21,525 calculate blocks at a time. 1210 00:55:21,525 --> 00:55:23,400 So calculating each block, you only need a 1211 00:55:23,400 --> 00:55:24,240 few amount of data. 1212 00:55:24,240 --> 00:55:27,340 So this is block matrix multiply. 1213 00:55:27,340 --> 00:55:32,180 And voila, by doing block matrix multiply, I got even 1214 00:55:32,180 --> 00:55:34,510 1.7x performance gain from that. 1215 00:55:34,510 --> 00:55:35,680 Now, it's very interesting. 1216 00:55:35,680 --> 00:55:38,690 Now with each block cycle, I am doing two operations. 1217 00:55:38,690 --> 00:55:39,580 So my [? superscale ?] 1218 00:55:39,580 --> 00:55:40,780 is finally coming to being. 1219 00:55:40,780 --> 00:55:41,900 So every clock cycle, I am doing 1220 00:55:41,900 --> 00:55:44,880 multiple operations here. 1221 00:55:44,880 --> 00:55:46,860 So this is very interesting, the number here. 1222 00:55:46,860 --> 00:55:50,540 So now when you look at what happens here, I am actually 1223 00:55:50,540 --> 00:55:53,730 executing a little bit more instructions here. 1224 00:55:53,730 --> 00:55:54,350 Why? 1225 00:55:54,350 --> 00:55:55,890 Because I did block matrix multiply. 1226 00:55:55,890 --> 00:55:58,020 That means I am doing a little bit more overhead in here. 1227 00:55:58,020 --> 00:55:59,580 I have more loops in here. 1228 00:55:59,580 --> 00:56:00,510 My inner loops are small. 1229 00:56:00,510 --> 00:56:03,820 So I'm doing more instructions to do execute there. 1230 00:56:03,820 --> 00:56:07,160 So I'm doing more instructions, but I have a 1231 00:56:07,160 --> 00:56:10,980 huge cache performance gain. 1232 00:56:10,980 --> 00:56:14,360 My cache miss rate now going down to very, very little-- 1233 00:56:14,360 --> 00:56:16,160 0.02%. 1234 00:56:16,160 --> 00:56:18,720 And I have an 8x improvement in cache, and that means I got 1235 00:56:18,720 --> 00:56:23,420 a 3x improvement in my CPI in here. 1236 00:56:23,420 --> 00:56:24,350 That's very nice. 1237 00:56:24,350 --> 00:56:27,210 That means even though I'm doing a little bit more work, 1238 00:56:27,210 --> 00:56:30,870 my cache behavior really, really helped me in here. 1239 00:56:30,870 --> 00:56:35,940 So kind of says why I am getting this performance gain. 1240 00:56:35,940 --> 00:56:38,250 So what else can we do? 1241 00:56:38,250 --> 00:56:42,270 We can go look at trying to do a lot more crazy optimizations 1242 00:56:42,270 --> 00:56:44,690 because if you look at the modern Intel, they have this 1243 00:56:44,690 --> 00:56:45,980 thing called [? retro ?] instructions. 1244 00:56:45,980 --> 00:56:49,040 We'll get a little bit more detail into that later. 1245 00:56:49,040 --> 00:56:57,120 And what we can do sometimes is you can nudge the compiler 1246 00:56:57,120 --> 00:56:59,830 to take advantage of these instructions. 1247 00:56:59,830 --> 00:57:02,560 Of course, if nothing helps, you can actually go write to 1248 00:57:02,560 --> 00:57:03,420 assembly [UNINTELLIGIBLE]. 1249 00:57:03,420 --> 00:57:06,500 But hey, I don't want you guys to write assembly code, at 1250 00:57:06,500 --> 00:57:09,070 least in this class, even though you can get really good 1251 00:57:09,070 --> 00:57:10,170 performance sometimes. 1252 00:57:10,170 --> 00:57:12,750 But you can kind of give enough hints to compiler and 1253 00:57:12,750 --> 00:57:16,340 pray and kind of [UNINTELLIGIBLE] 1254 00:57:16,340 --> 00:57:18,600 there's no nice way on these things, and hope for the 1255 00:57:18,600 --> 00:57:19,892 compiler to do the right thing. 1256 00:57:19,892 --> 00:57:22,460 1257 00:57:22,460 --> 00:57:25,800 So in here, we can play with, a lot of 1258 00:57:25,800 --> 00:57:27,720 times, compiler flags. 1259 00:57:27,720 --> 00:57:29,650 And this is not exact size. 1260 00:57:29,650 --> 00:57:32,250 We keep trying those things, and sometimes we can even get 1261 00:57:32,250 --> 00:57:35,430 information back and say, OK, what did the compiler do? 1262 00:57:35,430 --> 00:57:37,900 It might give you some good hints and say, I couldn't do 1263 00:57:37,900 --> 00:57:40,750 something because of this part of the code. 1264 00:57:40,750 --> 00:57:44,785 And worst comes to worst, you can actually generate assembly 1265 00:57:44,785 --> 00:57:47,500 and stare at it. 1266 00:57:47,500 --> 00:57:49,330 And sometimes in this class, you might actually want to 1267 00:57:49,330 --> 00:57:53,770 stare at assembly and see what goes on in here. 1268 00:57:53,770 --> 00:57:55,140 So here is the assembly for here. 1269 00:57:55,140 --> 00:57:58,710 And this is my loop body in here. 1270 00:57:58,710 --> 00:58:01,250 And then what I can see, I'm not going to go through that. 1271 00:58:01,250 --> 00:58:06,180 Everything is converted into SSE instructions. 1272 00:58:06,180 --> 00:58:08,930 At least I know what SSE instructions mean, so this 1273 00:58:08,930 --> 00:58:12,190 seems to be doing well, so I'm happy. 1274 00:58:12,190 --> 00:58:15,330 And in fact, I'm really happy because I got a 2.8x 1275 00:58:15,330 --> 00:58:17,000 improvement by doing this. 1276 00:58:17,000 --> 00:58:19,190 Now what I am actually doing each clock cycle about 1277 00:58:19,190 --> 00:58:19,650 [? five of my ?] 1278 00:58:19,650 --> 00:58:20,410 operations. 1279 00:58:20,410 --> 00:58:22,120 So I'm going really fast in here. 1280 00:58:22,120 --> 00:58:26,950 So now, incidentally, I'm running my program about 1281 00:58:26,950 --> 00:58:30,360 30,000x faster than when I started. 1282 00:58:30,360 --> 00:58:32,050 So if you haven't noticed, these things 1283 00:58:32,050 --> 00:58:34,960 slowly add up in here. 1284 00:58:34,960 --> 00:58:37,390 And now, if we look at what's going on here, it's very 1285 00:58:37,390 --> 00:58:38,910 interesting. 1286 00:58:38,910 --> 00:58:42,970 What I have done is actually, my each instruction is now CPI 1287 00:58:42,970 --> 00:58:44,280 has gone down. 1288 00:58:44,280 --> 00:58:46,890 That means each instruction is running slower because I'm 1289 00:58:46,890 --> 00:58:49,130 doing these SSE instructions. 1290 00:58:49,130 --> 00:58:53,680 And in fact, my cache miss rate has also gone up. 1291 00:58:53,680 --> 00:58:57,100 But I am doing a lot less instructions, because SSE 1292 00:58:57,100 --> 00:59:00,010 means in one instruction, I am doing a vector. 1293 00:59:00,010 --> 00:59:03,700 So I'm actually getting four operations done in one 1294 00:59:03,700 --> 00:59:04,130 instruction. 1295 00:59:04,130 --> 00:59:06,620 So the number of instructions I execute has 1296 00:59:06,620 --> 00:59:07,960 drastically gone down. 1297 00:59:07,960 --> 00:59:09,920 So even though my instructions are running a little bit 1298 00:59:09,920 --> 00:59:12,600 slower, I can basically amortize that, and I get good 1299 00:59:12,600 --> 00:59:14,970 performance in here. 1300 00:59:14,970 --> 00:59:16,510 And of course, this one says, OK, now 88% of the 1301 00:59:16,510 --> 00:59:17,790 instructions are SSEs. 1302 00:59:17,790 --> 00:59:18,980 So I have [UNINTELLIGIBLE] 1303 00:59:18,980 --> 00:59:22,850 most of the things to run on vector mode in here. 1304 00:59:22,850 --> 00:59:25,830 And of course, you can keep doing more and more and more. 1305 00:59:25,830 --> 00:59:28,430 And there are these four things [UNINTELLIGIBLE] matrix 1306 00:59:28,430 --> 00:59:31,560 multiply, there are people who spend their entire lifetime 1307 00:59:31,560 --> 00:59:33,050 trying to run it faster. 1308 00:59:33,050 --> 00:59:35,310 And they have libraries like that, and this thing called a 1309 00:59:35,310 --> 00:59:40,180 BLAS library and this thing called Intel Math Kernel. 1310 00:59:40,180 --> 00:59:42,790 So they have done a lot more other things, like 1311 00:59:42,790 --> 00:59:45,610 pre-fetching, getting everything right. 1312 00:59:45,610 --> 00:59:47,950 So here what I do is basically call their library, don't do 1313 00:59:47,950 --> 00:59:50,980 anything myself, and this library will do it for me. 1314 00:59:50,980 --> 00:59:54,810 And in fact, that library itself gave me another 2.7x 1315 00:59:54,810 --> 00:59:57,110 performance gain in there. 1316 00:59:57,110 --> 01:00:02,230 Now I'm actually doing 11 instructions every clock cycle 1317 01:00:02,230 --> 01:00:03,190 by doing that. 1318 01:00:03,190 --> 01:00:05,615 And if I look at it carefully, what they have done is they 1319 01:00:05,615 --> 01:00:09,995 are also very into SSE heavy, so they are running almost the 1320 01:00:09,995 --> 01:00:11,400 same amount of instructions. 1321 01:00:11,400 --> 01:00:14,240 But they managed to actually get, again, 1322 01:00:14,240 --> 01:00:16,570 a better miss rate. 1323 01:00:16,570 --> 01:00:19,310 They probably figured out some stuff in here, things like 1324 01:00:19,310 --> 01:00:20,800 pre-fetching instructions. 1325 01:00:20,800 --> 01:00:23,290 And that means they actually brought down the CPI back to 1326 01:00:23,290 --> 01:00:24,590 where it was before. 1327 01:00:24,590 --> 01:00:28,370 And that's why they gave that performance [INAUDIBLE]. 1328 01:00:28,370 --> 01:00:30,510 And finally, we are doing a little 1329 01:00:30,510 --> 01:00:33,770 bit of parallel execution. 1330 01:00:33,770 --> 01:00:35,960 So as you know, multicores are here. 1331 01:00:35,960 --> 01:00:38,220 And you're actually going to these 12-core machines, which 1332 01:00:38,220 --> 01:00:40,170 is really nice. 1333 01:00:40,170 --> 01:00:41,810 Fastest possible machines you can get your 1334 01:00:41,810 --> 01:00:42,980 hands on these days. 1335 01:00:42,980 --> 01:00:46,920 And it's kind of fun to work with them. 1336 01:00:46,920 --> 01:00:52,620 And we can basically get concurrency for parallel 1337 01:00:52,620 --> 01:00:53,140 performance. 1338 01:00:53,140 --> 01:00:56,410 And a large part of this class at the end is trying to get 1339 01:00:56,410 --> 01:00:58,290 good parallel performance. 1340 01:00:58,290 --> 01:00:59,880 But there are a lot of issues with parallelism. 1341 01:00:59,880 --> 01:01:02,270 I will go a couple of issues now. 1342 01:01:02,270 --> 01:01:05,910 Of course, when we get to that part, we will go in detail. 1343 01:01:05,910 --> 01:01:08,790 One thing is called Amdahl's Law. 1344 01:01:08,790 --> 01:01:10,920 Because if you're trying to run something parallel, you 1345 01:01:10,920 --> 01:01:13,030 can't run everything parallel. 1346 01:01:13,030 --> 01:01:15,860 When you start something, there's part of the code that 1347 01:01:15,860 --> 01:01:18,850 basically starts the computation. 1348 01:01:18,850 --> 01:01:21,580 And then there are the parts you can run parallel, and then 1349 01:01:21,580 --> 01:01:23,670 other parts you have to also wind down, 1350 01:01:23,670 --> 01:01:25,690 measure, things like that. 1351 01:01:25,690 --> 01:01:29,620 So what Amdahl's Law says is, even if you have an infinite 1352 01:01:29,620 --> 01:01:33,660 number of processors, the maximum speedup you can get is 1353 01:01:33,660 --> 01:01:36,200 this-- this the part of the code that are sequential and 1354 01:01:36,200 --> 01:01:38,100 [? parts of ?] the code are parallel. 1355 01:01:38,100 --> 01:01:42,640 That means if you have only 90% of the code can be 1356 01:01:42,640 --> 01:01:45,570 parallelized, the maximum speedup you can get is 10, 1357 01:01:45,570 --> 01:01:47,850 even if you have an infinite number of processors. 1358 01:01:47,850 --> 01:01:50,110 So 99% of the code is parallel, the maximum speedup 1359 01:01:50,110 --> 01:01:51,550 you can get is 100. 1360 01:01:51,550 --> 01:01:55,030 So this is very interesting, because even if you have huge 1361 01:01:55,030 --> 01:01:59,090 [UNINTELLIGIBLE], if your code has a certain amount of 1362 01:01:59,090 --> 01:02:04,030 sequential part, there is a limit of what you can do. 1363 01:02:04,030 --> 01:02:04,740 [UNINTELLIGIBLE] 1364 01:02:04,740 --> 01:02:06,440 also think of load balancing. 1365 01:02:06,440 --> 01:02:09,320 That means when you get parallel, what you are 1366 01:02:09,320 --> 01:02:12,890 assuming is all processes are busy all the time. 1367 01:02:12,890 --> 01:02:14,720 That's very hard to achieve. 1368 01:02:14,720 --> 01:02:16,100 Sometimes some processes are busy. 1369 01:02:16,100 --> 01:02:17,490 Sometimes others are not. 1370 01:02:17,490 --> 01:02:19,050 Sometimes something's finished early. 1371 01:02:19,050 --> 01:02:21,550 So if that happens, even though you have parallels, you 1372 01:02:21,550 --> 01:02:25,740 might not get really good performance in here. 1373 01:02:25,740 --> 01:02:27,350 And then, of course, there's a thing called granularity of 1374 01:02:27,350 --> 01:02:27,960 parallelism. 1375 01:02:27,960 --> 01:02:29,440 Normally, what happens in programming, you run 1376 01:02:29,440 --> 01:02:31,540 sequential part, you run parallel part, you run 1377 01:02:31,540 --> 01:02:33,090 sequential parallel part. 1378 01:02:33,090 --> 01:02:35,070 To start parallelism is a big task. 1379 01:02:35,070 --> 01:02:37,720 You have to tell all the other processors, OK, look, you guys 1380 01:02:37,720 --> 01:02:38,800 have some work. 1381 01:02:38,800 --> 01:02:41,960 Go do that, and finish that, and come back. 1382 01:02:41,960 --> 01:02:44,270 And that itself is very expensive. 1383 01:02:44,270 --> 01:02:46,760 And the work you give is very small. 1384 01:02:46,760 --> 01:02:47,500 You give the work. 1385 01:02:47,500 --> 01:02:50,570 The start-up and tear-down cost of parallelism itself 1386 01:02:50,570 --> 01:02:53,330 might be too much than the speedup you get. 1387 01:02:53,330 --> 01:02:57,080 In fact, you can make programs parallel and slow them down 1388 01:02:57,080 --> 01:02:58,410 really drastically. 1389 01:02:58,410 --> 01:03:00,680 So you have to be very careful in that, because you can say, 1390 01:03:00,680 --> 01:03:02,400 oh, I'm running in parallel, but it's running slow. 1391 01:03:02,400 --> 01:03:03,590 Why? 1392 01:03:03,590 --> 01:03:06,005 Because you might have a granularity issue, because we 1393 01:03:06,005 --> 01:03:09,470 are giving so little work that even though you're getting 1394 01:03:09,470 --> 01:03:11,220 parallelism, you might not get that much performance. 1395 01:03:11,220 --> 01:03:13,840 1396 01:03:13,840 --> 01:03:18,060 So if you look at matrix multiply, what happens here is 1397 01:03:18,060 --> 01:03:21,460 you can divide the matrix into two different parts in here 1398 01:03:21,460 --> 01:03:26,270 and calculate as two different matrix multiplies in a 1399 01:03:26,270 --> 01:03:29,220 different core or a different process. 1400 01:03:29,220 --> 01:03:31,580 Or in here, I can make it two or whatever in matrix 1401 01:03:31,580 --> 01:03:32,650 multiplies to do that. 1402 01:03:32,650 --> 01:03:34,360 So this is nicely parallelized. 1403 01:03:34,360 --> 01:03:36,125 No big issue in here. 1404 01:03:36,125 --> 01:03:40,030 And when I parallelize, I got rather 3.5x performance 1405 01:03:40,030 --> 01:03:41,130 improvement. 1406 01:03:41,130 --> 01:03:46,120 Of course, I could have gotten more here, but because I had 1407 01:03:46,120 --> 01:03:49,070 to start here, since I didn't want to wait for a week for 1408 01:03:49,070 --> 01:03:51,910 this to finish, I had to come up with a smaller matrix size. 1409 01:03:51,910 --> 01:03:55,270 But now, when I get here, this is so fast, my 1410 01:03:55,270 --> 01:03:57,080 granularity is too small. 1411 01:03:57,080 --> 01:03:58,620 So that's why I didn't get-- 1412 01:03:58,620 --> 01:04:00,790 because this was an eight-core machine, I didn't get 8x 1413 01:04:00,790 --> 01:04:03,690 because the minute I get parallelism, it just is over 1414 01:04:03,690 --> 01:04:06,550 because it's running so fast in here. 1415 01:04:06,550 --> 01:04:08,800 Now, at this point, using eight cores, I am actually 1416 01:04:08,800 --> 01:04:09,510 getting 36. 1417 01:04:09,510 --> 01:04:12,470 So this is not that interesting, because this is 1418 01:04:12,470 --> 01:04:15,825 not running on one core, every clock cycle, I am trying to 1419 01:04:15,825 --> 01:04:18,580 get 36 operations done. 1420 01:04:18,580 --> 01:04:25,480 So the interesting thing to follow in here is that this 1421 01:04:25,480 --> 01:04:28,610 example might be somewhat contrived, because I started 1422 01:04:28,610 --> 01:04:31,380 with this immutable-type matrix multiply. 1423 01:04:31,380 --> 01:04:35,180 Hopefully, you and your colleagues who haven't taken 1424 01:04:35,180 --> 01:04:38,060 this class probably won't go that far. 1425 01:04:38,060 --> 01:04:42,220 But even if you start somewhere here, still, there's 1426 01:04:42,220 --> 01:04:45,610 a huge gain to be made. 1427 01:04:45,610 --> 01:04:49,450 So to put this in perspective, OK, so if you look at here, 1428 01:04:49,450 --> 01:04:55,310 even if I start here somewhere out in C, I already got 131x 1429 01:04:55,310 --> 01:04:56,930 performance improvement. 1430 01:04:56,930 --> 01:04:58,880 That's huge. 1431 01:04:58,880 --> 01:05:00,980 So that means in this class, we are not 1432 01:05:00,980 --> 01:05:02,080 talking about small things. 1433 01:05:02,080 --> 01:05:06,810 And in fact, this is why I had to go log scale in doing 1434 01:05:06,810 --> 01:05:10,050 absolute grades, because we had people who figured out 1435 01:05:10,050 --> 01:05:13,160 exactly how, in some problems, to change data structures, 1436 01:05:13,160 --> 01:05:16,080 change execution ordering, do some really cool thing, get 1437 01:05:16,080 --> 01:05:18,050 1,000x performance improvement. 1438 01:05:18,050 --> 01:05:20,100 And there are other people who did little bit things and got 1439 01:05:20,100 --> 01:05:22,630 2x improvement. 1440 01:05:22,630 --> 01:05:25,890 So the fact that what you can do the best here can be 1441 01:05:25,890 --> 01:05:28,050 really, really high. 1442 01:05:28,050 --> 01:05:30,580 So that's why we actually started giving you the second 1443 01:05:30,580 --> 01:05:34,330 chance to also go basically see, if you missed a critical 1444 01:05:34,330 --> 01:05:38,110 point, to go back and learn that. 1445 01:05:38,110 --> 01:05:40,820 So to put this in perspective-- 1446 01:05:40,820 --> 01:05:44,230 so in summary, what we have done is-- 1447 01:05:44,230 --> 01:05:47,850 matrix multiply might be an exception. 1448 01:05:47,850 --> 01:05:50,565 Every time you go, you might not get that much, but in 1449 01:05:50,565 --> 01:05:54,080 here, going from this very software-engineered, nice 1450 01:05:54,080 --> 01:05:54,940 [UNINTELLIGIBLE] 1451 01:05:54,940 --> 01:06:00,890 to a BLAS [UNINTELLIGIBLE], I got a 296,000 times 1452 01:06:00,890 --> 01:06:01,960 performance improvement. 1453 01:06:01,960 --> 01:06:04,890 This is huge. 1454 01:06:04,890 --> 01:06:08,100 If you put this in perspective, if you look at 1455 01:06:08,100 --> 01:06:10,920 miles per gallon between these two-- a 1456 01:06:10,920 --> 01:06:13,285 supertanker and this scooter-- 1457 01:06:13,285 --> 01:06:16,520 it's only 14,000x, basically. 1458 01:06:16,520 --> 01:06:19,900 So what that means is you have to have 20 supertankers to 1459 01:06:19,900 --> 01:06:22,000 basically match this much improvement. 1460 01:06:22,000 --> 01:06:24,020 So if somebody can tell you, look, I can't get a 1461 01:06:24,020 --> 01:06:27,000 supertanker to run at the same miles per gallon as your 1462 01:06:27,000 --> 01:06:28,840 scooter, that would be revolutionary. 1463 01:06:28,840 --> 01:06:31,800 1464 01:06:31,800 --> 01:06:34,830 But you can do something similar in your computer. 1465 01:06:34,830 --> 01:06:37,390 And in fact, if you look at these large data centers, the 1466 01:06:37,390 --> 01:06:39,630 computers are actually burning a lot of 1467 01:06:39,630 --> 01:06:42,050 energy, a lot of cost. 1468 01:06:42,050 --> 01:06:44,550 And in fact, if you can reduce that, that can have impact. 1469 01:06:44,550 --> 01:06:48,590 So what you get out of this class is to think through 1470 01:06:48,590 --> 01:06:50,840 these programs this way. 1471 01:06:50,840 --> 01:06:52,890 Learn about correctness before [UNINTELLIGIBLE] 1472 01:06:52,890 --> 01:06:56,840 6.170, you hopefully are good at writing correct programs. 1473 01:06:56,840 --> 01:06:58,840 But that's not good enough in a lot of times. 1474 01:06:58,840 --> 01:07:01,320 And how to think about programs, how to make it run 1475 01:07:01,320 --> 01:07:04,080 faster, there are multiple [UNINTELLIGIBLE] necessary. 1476 01:07:04,080 --> 01:07:06,880 So for example, in here, it's basically direct energy. 1477 01:07:06,880 --> 01:07:13,720 So can you get the data center to burn 10% less energy, 1478 01:07:13,720 --> 01:07:14,970 that's huge. 1479 01:07:14,970 --> 01:07:17,010 1480 01:07:17,010 --> 01:07:21,270 For example, if you look at today's supercomputing 1481 01:07:21,270 --> 01:07:26,410 centers, they spend about $10 million a year on energy, just 1482 01:07:26,410 --> 01:07:27,360 for the power grid. 1483 01:07:27,360 --> 01:07:29,010 So if you can say, I'm [? playing with ?] 1484 01:07:29,010 --> 01:07:31,060 10%, that's huge. 1485 01:07:31,060 --> 01:07:38,140 Other thing, one thing Charles says a lot of times is, what 1486 01:07:38,140 --> 01:07:42,730 performance gives you is currency, because performance 1487 01:07:42,730 --> 01:07:45,800 itself might not be that directly useful, but it gives 1488 01:07:45,800 --> 01:07:48,330 you currency to buy something else. 1489 01:07:48,330 --> 01:07:49,030 What it is? 1490 01:07:49,030 --> 01:07:51,890 So assume you have a nice application. 1491 01:07:51,890 --> 01:07:54,620 Your GUI is doing OK. 1492 01:07:54,620 --> 01:07:58,630 It's up to the point that people don't feel stalled. 1493 01:07:58,630 --> 01:08:01,160 But if you want to do something addition, if you add 1494 01:08:01,160 --> 01:08:04,420 a feature, you don't want to slow down the program too 1495 01:08:04,420 --> 01:08:06,630 much, because your users are going to complain. 1496 01:08:06,630 --> 01:08:08,710 So what performance gives you is, if you improve the 1497 01:08:08,710 --> 01:08:11,310 performance, it gives you currency, the ability to buy 1498 01:08:11,310 --> 01:08:13,530 some additional features into your system. 1499 01:08:13,530 --> 01:08:17,029 Or you have a system that you created a start up, and you 1500 01:08:17,029 --> 01:08:19,250 created this nice server somewhere. 1501 01:08:19,250 --> 01:08:21,850 Suddenly, it became very popular, and a million people 1502 01:08:21,850 --> 01:08:23,829 want to use your software. 1503 01:08:23,829 --> 01:08:27,340 1504 01:08:27,340 --> 01:08:30,080 And the software might not be ready to do that. 1505 01:08:30,080 --> 01:08:32,680 It might just crash and burn, or you might have to pay a lot 1506 01:08:32,680 --> 01:08:36,569 of money for Amazon for the [? cycles, ?] 1507 01:08:36,569 --> 01:08:39,500 so by actually making programs faster, you can make it 1508 01:08:39,500 --> 01:08:41,859 scalable, you can make it efficient. 1509 01:08:41,859 --> 01:08:45,970 So this gives you the ability, this currency, to go do a lot 1510 01:08:45,970 --> 01:08:48,410 more other interesting things to your software because now 1511 01:08:48,410 --> 01:08:49,600 you can run it faster. 1512 01:08:49,600 --> 01:08:52,870 It gives you this margin to play with. 1513 01:08:52,870 --> 01:08:55,500 So in this class, we are going to learn a 1514 01:08:55,500 --> 01:08:56,370 lot about these things. 1515 01:08:56,370 --> 01:09:00,319 We are going to learn about architecture, how looking at 1516 01:09:00,319 --> 01:09:02,560 the architectural issues, how to identify when 1517 01:09:02,560 --> 01:09:03,990 something is wrong. 1518 01:09:03,990 --> 01:09:05,670 We are going to look at algorithmic issues in here, 1519 01:09:05,670 --> 01:09:06,950 how to look at things. 1520 01:09:06,950 --> 01:09:08,439 We are looking at architectural things like 1521 01:09:08,439 --> 01:09:10,700 memory systems, parallelism. 1522 01:09:10,700 --> 01:09:14,560 And also we will talk about all the tools, so you will 1523 01:09:14,560 --> 01:09:15,970 actually know something has gone wrong. 1524 01:09:15,970 --> 01:09:19,399 Because today, you run the program, it runs, it runs 1525 01:09:19,399 --> 01:09:21,609 correctly, is it good enough or not? 1526 01:09:21,609 --> 01:09:22,970 How do you identify that? 1527 01:09:22,970 --> 01:09:25,569 That's, I think, the very interesting place to start. 1528 01:09:25,569 --> 01:09:29,450 So that's all I have today. 1529 01:09:29,450 --> 01:09:32,710 Make sure you get a Project 0. 1530 01:09:32,710 --> 01:09:37,720 And especially people who haven't done C, we will-- 1531 01:09:37,720 --> 01:09:39,692 Charles, are we going to do a C remediation? 1532 01:09:39,692 --> 01:09:42,479 1533 01:09:42,479 --> 01:09:44,420 We did last time, didn't we? 1534 01:09:44,420 --> 01:09:46,399 OK, guys, one more thing. 1535 01:09:46,399 --> 01:09:50,380 Last year, we did an evening C remediation class. 1536 01:09:50,380 --> 01:09:53,510 So if you know Java, if you want to know C, we will send 1537 01:09:53,510 --> 01:09:58,420 out a time with TAs that will give you kind of a one-hour 1538 01:09:58,420 --> 01:10:02,400 crash course into C in there. 1539 01:10:02,400 --> 01:10:04,170 And we will send information. 1540 01:10:04,170 --> 01:10:06,370 So this class moves very fast. 1541 01:10:06,370 --> 01:10:10,160 Get on with Project 0, and see you next week. 1542 01:10:10,160 --> 01:10:11,410